Good afternoon Community,
I currently have a large file with several headers which include notably fix_timestamps and id. The data is formatted like this.
fix_timestamps | id |
---|---|
2023-08-01 00:02:52.527 | WPA54 |
2023-08-01 00:02:52.527 | WPA54 |
2023-08-01 00:02:52.527 | WPA54 |
2023-08-01 00:10:10.640 | WPA54 |
2023-08-01 00:10:10.640 | WPA54 |
2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:10:10.640 | IBT675 |
2023-08-01 00:10:10.640 | IBT675 |
2023-08-01 00:10:10.640 | IBT675 |
2023-08-01 00:10:26.937 | IBT675 |
2023-08-01 00:10:26.937 | IBT675 |
2023-08-01 00:02:52.527 | IBT675 |
2023-08-01 00:02:52.527 | IBT675 |
etc...etc. |
I would like to be able to read through my file and for each value for the length id column, look for unique times in the fix_timestamps column and append these times in a new fix_timestamp, delimited by ' ; '. Essentially generating a dataframe output that looks like this:
fix_timestamps | id |
---|---|
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 | WPA54 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 | IBT675 |
My current script is able to follow some of the logic but not completely and I have been trying to figure out why
import pandas as pd
# Read the CSV file
file_path = 'input.csv'
df = pd.read_csv(file_path)
# Function to append unique timestamps based on ID length
def append_timestamp(row):
id_length = len(row['id'])
timestamps = []
# for i in range(id_length):
# timestamps.add(row['fix_timestamps'])
# return ';'.join(timestamps)
for i in range(id_length):
timestamps.append(row['fix_timestamps'])
return ';'.join(timestamps)
# Apply the function to the DataFrame rows
df['fix_timestamps'] = df.apply(append_timestamp, axis=1)
print(df)
# Save the DataFrame to a CSV file
output_file_path = 'output'
df.to_csv(output_file_path, index=False)
Using the above input example and running it through the current script, my output essentially generates this:
fix_timestamps | id |
---|---|
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;..... | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;.... | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;.... | WPA54 |
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;..... | WPA54 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527.... | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527.... | IBT675 |
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527.... | IBT675 |
So it seems to just repeat the timestamp at each unique time and append the same value when I would like to do it for all unique times for the id.
Thank you all
You can convert to string (with astype
), then use a custom groupby.transform
and unique
:
df['fix_timestamps'] = (df['fix_timestamps'].astype(str).groupby(df['id'])
.transform(lambda x: ';'.join(x.unique()))
)
Output:
fix_timestamps id
0 2023-08-01 00:02:52.527;2023-08-01 00:10:10.64... WPA54
1 2023-08-01 00:02:52.527;2023-08-01 00:10:10.64... WPA54
2 2023-08-01 00:02:52.527;2023-08-01 00:10:10.64... WPA54
3 2023-08-01 00:02:52.527;2023-08-01 00:10:10.64... WPA54
4 2023-08-01 00:02:52.527;2023-08-01 00:10:10.64... WPA54
5 2023-08-01 00:02:52.527;2023-08-01 00:10:10.64... WPA54
6 2023-08-01 00:02:52.527;2023-08-01 00:10:10.64... WPA54
7 2023-08-01 00:10:10.640;2023-08-01 00:10:26.93... IBT675
8 2023-08-01 00:10:10.640;2023-08-01 00:10:26.93... IBT675
9 2023-08-01 00:10:10.640;2023-08-01 00:10:26.93... IBT675
10 2023-08-01 00:10:10.640;2023-08-01 00:10:26.93... IBT675
11 2023-08-01 00:10:10.640;2023-08-01 00:10:26.93... IBT675
12 2023-08-01 00:10:10.640;2023-08-01 00:10:26.93... IBT675
13 2023-08-01 00:10:10.640;2023-08-01 00:10:26.93... IBT675