Example , consider the
df:
Time colA colB
0 1.1 2 2
1 2.2 2 2
2 3.4 3 5
3 4.5 3 5
4 5.6 4 5
5 6.2 4 6
6 7.4 4 6
7 8.5 2 6
8 9.8 2 5
9 10.1 2 5
10 11.2 2 5
The ouptut I am expecting is a report CSV file with the columns as follows :
Col_name unique_value Duration
colA 2 3.8s
colA 3 1.1s
colA 4 1.8s
colB 2 1.1s
colB 5 3.6s
colB 6 2.3s
(eg): To calculate colA :
unique value = 2
Duration = [1st consecutive appearance of 2 time difference (2.2-1.1)] + [2nd consecutive appearance time difference (11.2-8.5)] = 1.1 + 2.7 = 3.8s
One of the logics I tried is :
df["answer"] = df['colA'].diff().eq(0)
As next step, I was planning to get all the False in one list and True in one list and get the difference of the list .
How to link these with unique values, is what I am confused of.
Do help me figure out if the existing logic works or if should change the logic
*To create a new column indicating consecutive values with True and False is a good start. However, to calculate the duration for each unique value in each column, you can use the following steps: Iterate over each unique value in each column. For each unique value, find the consecutive occurrences and calculate the duration. Store the results in a new Data Frame.
import pandas as pd
# Sample DataFrame
data = {
'Time': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'colA': [1.1, 2.2, 3.4, 4.5, 5.6, 6.2, 7.4, 8.5, 9.8, 10.1, 11.2],
'colB': [2, 2, 3, 3, 4, 4, 4, 2, 2, 2, 2]
}
df = pd.DataFrame(data)
# Function to calculate duration for each unique value in a column
def calculate_duration(column_name):
durations = []
unique_values = df[column_name].unique()
for value in unique_values:
# Find consecutive occurrences of the value
consecutive_indices = df[df[column_name] == value].index.to_list()
consecutive_occurrences = []
current_occurrence = [consecutive_indices[0]]
for i in range(1, len(consecutive_indices)):
if consecutive_indices[i] - consecutive_indices[i-1] == 1:
current_occurrence.append(consecutive_indices[i])
else:
consecutive_occurrences.append(current_occurrence)
current_occurrence = [consecutive_indices[i]]
consecutive_occurrences.append(current_occurrence)
# Calculate duration for each consecutive occurrence
for occurrence in consecutive_occurrences:
start_time = df.iloc[occurrence[0]]['Time']
end_time = df.iloc[occurrence[-1]]['Time']
duration = end_time - start_time
durations.append((value, duration))
return durations
# Create a DataFrame to store results
report_df = pd.DataFrame(columns=['Col_name', 'unique_value', 'Duration'])
# Calculate durations for each column
for column in df.columns[1:]:
durations = calculate_duration(column)
for value, duration in durations:
report_df = report_df.append({'Col_name': column, 'unique_value': value, 'Duration': duration}, ignore_index=True)
# Export to CSV
report_df.to_csv('report.csv', index=False)
Export the Data Frame to a CSV file.*