pandas datetime time-series data-cleaning timedelta

adding time delta specific to row (Date time Cleaning)(Date time)

series data I have 2 files. Each data observation has 20 sec delay

file 1

file 2

Expected Output

What I have done so far

with code

Solution

You can generate a date_range() with freq='20s'.

Note that I changed April 31 to a valid date. I assume the real data will have valid dates.

dfs = []
for file in files:
    column_name = pd.read_csv(file, index_col=False).columns[0]
    df = pd.read_csv(file, index_col=False, names=['col1', 'col2', 'col3'], skiprows=1)
    
    # generate date range with frequency of 20 seconds
    start = pd.to_datetime(column_name, dayfirst=True)
    df['col4'] = pd.date_range(start=start, periods=len(df), freq='20s')
    
    dfs.append(df)
df = pd.concat(dfs)

#    col1  col2  col3                col4
# 0     3     5     6 2005-03-31 23:59:00
# 1     4     6     7 2005-03-31 23:59:20
# 2     8     9    10 2005-03-31 23:59:40
# 0    10    20    30 2007-06-01 23:59:00
# 1    40    50    60 2007-06-01 23:59:20
# 2    70    80    90 2007-06-01 23:59:40

To change the frequency after concatenating, you can groupby() each sub-dataframe (each one restarts at index 0) and regenerate a date_range() with the new freq:

freq = '30s'
df['col4'] = df.groupby((df.index == 0).cumsum())['col4'].apply(
    lambda g: pd.Series(pd.date_range(g[0], periods=len(g), freq=freq)))

#    col1  col2  col3                col4
# 0     3     5     6 2005-03-31 23:59:00
# 1     4     6     7 2005-03-31 23:59:30
# 2     8     9    10 2005-04-01 00:00:00
# 0    10    20    30 2007-01-06 23:59:00
# 1    40    50    60 2007-01-06 23:59:30
# 2    70    80    90 2007-01-07 00:00:00