Appending unique timestamps to a row of columns based on another column value

Good afternoon Community,

I currently have a large file with several headers which include notably fix_timestamps and id. The data is formatted like this.

fix_timestamps	id
2023-08-01 00:02:52.527	WPA54
2023-08-01 00:02:52.527	WPA54
2023-08-01 00:02:52.527	WPA54
2023-08-01 00:10:10.640	WPA54
2023-08-01 00:10:10.640	WPA54
2023-08-01 00:10:26.937	WPA54
2023-08-01 00:10:26.937	WPA54
2023-08-01 00:10:10.640	IBT675
2023-08-01 00:10:10.640	IBT675
2023-08-01 00:10:10.640	IBT675
2023-08-01 00:10:26.937	IBT675
2023-08-01 00:10:26.937	IBT675
2023-08-01 00:02:52.527	IBT675
2023-08-01 00:02:52.527	IBT675
etc...etc.

I would like to be able to read through my file and for each value for the length id column, look for unique times in the fix_timestamps column and append these times in a new fix_timestamp, delimited by ' ; '. Essentially generating a dataframe output that looks like this:

fix_timestamps	id
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937	WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937	WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937	WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937	WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937	WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937	WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937	WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937	WPA54
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527	IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527	IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527	IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527	IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527	IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527	IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527	IBT675

My current script is able to follow some of the logic but not completely and I have been trying to figure out why

import pandas as pd

# Read the CSV file
file_path = 'input.csv'
df = pd.read_csv(file_path)

# Function to append unique timestamps based on ID length
def append_timestamp(row):
    id_length = len(row['id'])
    timestamps = []

    # for i in range(id_length):
    #     timestamps.add(row['fix_timestamps'])
    # return ';'.join(timestamps)

    for i in range(id_length):
        timestamps.append(row['fix_timestamps'])
    return ';'.join(timestamps)

# Apply the function to the DataFrame rows
df['fix_timestamps'] = df.apply(append_timestamp, axis=1)

print(df)

# Save the DataFrame to a CSV file
output_file_path = 'output'
df.to_csv(output_file_path, index=False)

Using the above input example and running it through the current script, my output essentially generates this:

fix_timestamps	id
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;.....	WPA54
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;....	WPA54
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;....	WPA54
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;.....	WPA54
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527....	IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527....	IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527....	IBT675

So it seems to just repeat the timestamp at each unique time and append the same value when I would like to do it for all unique times for the id.

Thank you all

Solution

You can convert to string (with astype), then use a custom groupby.transform and unique:

df['fix_timestamps'] = (df['fix_timestamps'].astype(str).groupby(df['id'])
                        .transform(lambda x: ';'.join(x.unique()))
                       )

Output:

                                       fix_timestamps      id
0   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
1   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
2   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
3   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
4   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
5   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
6   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
7   2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
8   2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
9   2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
10  2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
11  2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
12  2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
13  2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675