Search code examples
pythonpandasdataframenumpypandas-resample

Pandas resample signal series with its corresponding label


I have this table with these columns: Seconds, Amplitude, Labels, Metadata. Basically, it's an ECG signal.

You can download the csv here: https://tmpfiles.org/3951223/question.csv

As you see, the second timestep is 0.004. How to resample that with the desired new timestep, such as 0.002, without destructing another column.

Such as label_encoding, that column is intended for machine learning y label purpose, especially multiclassification problem; it's segmentation region. It's unique values are (24, 1, 27).

While bound_or_peak is intended for displaying or plotting the purpose of the region. It consists of 3 bits (the maximum value is 7). If the most significant bit set, then it started region to plot (onset). If the second bit is set, then it must be a peak of the ECG signal wave. If the least significant bit is set, then it must be an offset region to plot.

Here is the table produced by this code:

%load_ext google.colab.data_table
import numpy as np
import pandas as pd

# Create a NumPy matrix with row and column labels
matrix_data = signal.signal_arr

dtype_dict = {'seconds': float, 'amplitude': float, 'label_encoding': int, 'bound_or_peak': int}

# Convert the NumPy matrix to a Pandas DataFrame with labels
df = pd.DataFrame(matrix_data, columns=dtype_dict.keys()).astype(dtype_dict)

# Display the DataFrame
df[:250]

What I mean with without destruction another column is: after resampled, another column such as labels and bound_or_peak are located as is following df before resampled. While amplitude should have interpolated, especially linear interpolated.

Actually, I have an idea to ignore the seconds column. Instead, that column can be compressed into a single value, such as in frequency sampling. So converting timestep to frequency sampling is a good idea, I think. 0.004 means 1/0.004; therefore, the frequency sampling is 250.

Now the problem is how to resample or interpolate the amplitude to another frequency sampling without destructing another column.

Update: As the commentator said, I should have used textual representation to show the table instead of a picture:

index seconds amplitude label_encoding bound_or_peak
0 0.0 0.035 0 0
1 0.004 0.06 0 0
2 0.008 0.065 0 0
3 0.012 0.075 0 0
4 0.016 0.085 0 0
5 0.02 0.075 0 0
6 0.024 0.065 0 0
7 0.028 0.065 0 0
8 0.032 0.065 0 0
9 0.036000000000000004 0.07 0 0
10 0.04 0.075 0 0
11 0.044 0.075 0 0
12 0.048 0.075 0 0
13 0.052000000000000005 0.07 0 0
14 0.056 0.065 0 0
15 0.06 0.065 0 0
16 0.064 0.065 0 0
17 0.068 0.065 0 0
18 0.07200000000000001 0.065 0 0
19 0.076 0.06 0 0
20 0.08 0.055 0 0
21 0.084 0.04 0 0
22 0.088 0.03 0 0
23 0.092 0.015 0 0
24 0.096 0.0 0 0
25 0.1 -0.01 0 0
26 0.10400000000000001 -0.02 0 0
27 0.108 -0.03 0 0
28 0.112 -0.04 0 0
29 0.116 -0.05 0 0
30 0.12 -0.06 0 0
31 0.124 -0.07 0 0
32 0.128 -0.08 0 0
33 0.132 -0.09 0 0
34 0.136 -0.095 0 0
35 0.14 -0.09 0 0
36 0.14400000000000002 -0.085 0 0
37 0.148 -0.085 0 0
38 0.152 -0.085 0 0
39 0.156 -0.09 0 0
40 0.16 -0.095 0 0
41 0.164 -0.09 0 0
42 0.168 -0.085 0 0
43 0.17200000000000001 -0.085 0 0
44 0.176 -0.085 0 0
45 0.18 -0.085 0 0
46 0.184 -0.08 0 0
47 0.188 -0.075 0 0
48 0.192 -0.075 0 0
49 0.196 -0.075 0 0
50 0.2 -0.075 0 0
51 0.20400000000000001 -0.075 0 0
52 0.20800000000000002 -0.075 0 0
53 0.212 -0.075 0 0
54 0.216 -0.07 0 0
55 0.22 -0.065 0 0
56 0.224 -0.06 0 0
57 0.228 -0.055 0 0
58 0.232 -0.055 0 0
59 0.23600000000000002 -0.055 0 0
60 0.24 -0.065 0 0
61 0.244 -0.075 0 0
62 0.248 -0.075 0 0
63 0.252 -0.075 0 0
64 0.256 -0.07 0 0
65 0.26 -0.065 0 0
66 0.264 -0.06 0 0
67 0.268 -0.06 0 0
68 0.272 -0.07 0 0
69 0.276 -0.075 0 0
70 0.28 -0.075 0 0
71 0.28400000000000003 -0.075 0 0
72 0.28800000000000003 -0.075 0 0
73 0.292 -0.07 0 0
74 0.296 -0.06 0 0
75 0.3 -0.06 0 0
76 0.304 -0.07 0 0
77 0.308 -0.075 0 0
78 0.312 -0.08 0 0
79 0.316 -0.085 0 0
80 0.32 -0.085 0 0
81 0.324 -0.085 0 0
82 0.328 -0.08 0 0
83 0.332 -0.075 0 0
84 0.336 -0.075 0 0
85 0.34 -0.08 0 0
86 0.34400000000000003 -0.085 24 4
87 0.34800000000000003 -0.08 24 0
88 0.352 -0.075 24 0
89 0.356 -0.06 24 0
90 0.36 -0.045 24 0
91 0.364 -0.035 24 0
92 0.368 -0.025 24 0
93 0.372 -0.025 24 0
94 0.376 -0.025 24 0
95 0.38 -0.02 24 0
96 0.384 -0.015 24 0
97 0.388 -0.01 24 0
98 0.392 -0.005 24 0
99 0.396 0.005 24 0
100 0.4 0.02 24 0
101 0.404 0.035 24 0
102 0.40800000000000003 0.045 24 2
103 0.41200000000000003 0.05 24 0
104 0.41600000000000004 0.055 24 0
105 0.42 0.05 24 0
106 0.424 0.035 24 0
107 0.428 0.015 24 0
108 0.432 -0.005 24 0
109 0.436 -0.035 24 0
110 0.44 -0.05 24 0
111 0.444 -0.065 24 1
112 0.448 -0.08 0 0
113 0.452 -0.09 0 0
114 0.456 -0.095 0 0
115 0.46 -0.09 0 0
116 0.464 -0.085 0 0
117 0.468 -0.09 0 0
118 0.47200000000000003 -0.095 0 0
119 0.47600000000000003 -0.095 0 0
120 0.48 -0.095 0 0
121 0.484 -0.1 0 0
122 0.488 -0.105 0 0
123 0.492 -0.105 0 0
124 0.496 -0.105 0 0
125 0.5 -0.105 0 0
126 0.504 -0.115 0 0
127 0.508 -0.115 0 0
128 0.512 -0.11 0 0
129 0.516 -0.105 0 0
130 0.52 -0.105 0 0
131 0.524 -0.105 0 0
132 0.528 -0.095 0 0
133 0.532 -0.085 0 0
134 0.536 -0.09 0 0
135 0.54 -0.095 0 0
136 0.544 -0.09 0 0
137 0.548 -0.085 0 0
138 0.552 -0.08 1 4
139 0.556 -0.075 1 0
140 0.56 -0.08 1 0
141 0.5640000000000001 -0.07 1 0
142 0.5680000000000001 -0.025 1 0
143 0.5720000000000001 0.075 1 0
144 0.5760000000000001 0.25 1 0
145 0.58 0.54 1 0
146 0.584 0.96 1 0
147 0.588 1.41 1 2
148 0.592 1.885 1 0
149 0.596 1.735 1 0
150 0.6 1.09 1 0
151 0.604 0.35 1 0
152 0.608 -0.455 1 0
153 0.612 -0.725 1 0
154 0.616 -0.705 1 0
155 0.62 -0.54 1 0
156 0.624 -0.315 1 0
157 0.628 -0.195 1 0
158 0.632 -0.115 1 1
159 0.636 -0.09 0 0
160 0.64 -0.08 0 0
161 0.644 -0.075 0 0
162 0.648 -0.08 0 0
163 0.652 -0.085 0 0
164 0.656 -0.085 0 0
165 0.66 -0.085 0 0
166 0.664 -0.08 0 0
167 0.668 -0.08 0 0
168 0.672 -0.085 0 0
169 0.676 -0.085 0 0
170 0.68 -0.085 0 0
171 0.684 -0.075 0 0
172 0.6880000000000001 -0.065 0 0
173 0.6920000000000001 -0.07 0 0
174 0.6960000000000001 -0.075 0 0
175 0.7000000000000001 -0.07 0 0
176 0.704 -0.065 0 0
177 0.708 -0.06 0 0
178 0.712 -0.055 0 0
179 0.716 -0.05 0 0
180 0.72 -0.045 0 0
181 0.724 -0.04 0 0
182 0.728 -0.035 27 4
183 0.732 -0.035 27 0
184 0.736 -0.035 27 0
185 0.74 -0.035 27 0
186 0.744 -0.035 27 0
187 0.748 -0.03 27 0
188 0.752 -0.02 27 0
189 0.756 -0.01 27 0
190 0.76 -0.005 27 0
191 0.764 0.0 27 0
192 0.768 0.005 27 0
193 0.772 0.005 27 0
194 0.776 0.005 27 0
195 0.78 0.01 27 0
196 0.784 0.025 27 0
197 0.788 0.04 27 0
198 0.792 0.045 27 0
199 0.796 0.05 27 0
200 0.8 0.055 27 0
201 0.804 0.055 27 0
202 0.808 0.055 27 0
203 0.812 0.06 27 0
204 0.8160000000000001 0.065 27 0
205 0.8200000000000001 0.07 27 0
206 0.8240000000000001 0.085 27 0
207 0.8280000000000001 0.1 27 0
208 0.8320000000000001 0.105 27 0
209 0.836 0.105 27 0
210 0.84 0.11 27 0
211 0.844 0.115 27 0
212 0.848 0.12 27 0
213 0.852 0.125 27 0
214 0.856 0.12 27 2
215 0.86 0.115 27 0
216 0.864 0.115 27 0
217 0.868 0.115 27 0
218 0.872 0.115 27 0
219 0.876 0.115 27 0
220 0.88 0.115 27 0
221 0.884 0.115 27 0
222 0.888 0.115 27 0
223 0.892 0.115 27 0
224 0.896 0.11 27 0
225 0.9 0.105 27 0
226 0.904 0.1 27 0
227 0.908 0.09 27 0
228 0.912 0.07 27 0
229 0.916 0.05 27 0
230 0.92 0.035 27 0
231 0.924 0.015 27 0
232 0.928 -0.005 27 0
233 0.932 -0.02 27 0
234 0.936 -0.03 27 0
235 0.9400000000000001 -0.04 27 0
236 0.9440000000000001 -0.05 27 0
237 0.9480000000000001 -0.055 27 0
238 0.9520000000000001 -0.06 27 0
239 0.9560000000000001 -0.07 27 1
240 0.96 -0.08 0 0
241 0.964 -0.085 0 0
242 0.968 -0.085 0 0
243 0.972 -0.085 0 0
244 0.976 -0.085 0 0
245 0.98 -0.085 0 0
246 0.984 -0.08 0 0
247 0.988 -0.075 0 0
248 0.992 -0.08 0 0
249 0.996 -0.085 0 0

Solution

  • This is a straightforward application of resample(), but you have to make some aggregation decisions.

    from io import StringIO
    
    import pandas as pd
    
    content = '''
    index   seconds     amplitude   label_encoding  bound_or_peak
    0   0.0     0.035   0   0
    1   0.004   0.06    0   0
    2   0.008   0.065   0   0
    3   0.012   0.075   0   0
    4   0.016   0.085   0   0
    5   0.02    0.075   0   0
    6   0.024   0.065   0   0
    ...
    245     0.98    -0.085  0   0
    246     0.984   -0.08   0   0
    247     0.988   -0.075  0   0
    248     0.992   -0.08   0   0
    249     0.996   -0.085  0   0
    '''
    with StringIO(content) as file:
        df = pd.read_csv(file, delim_whitespace=True)
    df['seconds'] *= pd.Timedelta(1, 's')
    df.set_index('seconds', drop=True, inplace=True)
    
    sampler = df.resample(rule='2ms')
    resampled = sampler.nearest()[['index', 'label_encoding']]
    resampled['amplitude'] = sampler.interpolate('time')['amplitude']
    resampled['bound_or_peak'] = sampler.asfreq(fill_value=0)['bound_or_peak']
    
    pd.options.display.width = 200
    pd.options.display.max_columns = 10
    print(resampled)
    
                            index  label_encoding  amplitude  bound_or_peak
    seconds                                                                
    0 days 00:00:00             0               0     0.0350              0
    0 days 00:00:00.002000      1               0     0.0475              0
    0 days 00:00:00.004000      1               0     0.0600              0
    0 days 00:00:00.006000      2               0     0.0625              0
    0 days 00:00:00.008000      2               0     0.0650              0
    ...                       ...             ...        ...            ...
    0 days 00:00:00.988000    247               0    -0.0750              0
    0 days 00:00:00.990000    248               0    -0.0775              0
    0 days 00:00:00.992000    248               0    -0.0800              0
    0 days 00:00:00.994000    249               0    -0.0825              0
    0 days 00:00:00.996000    249               0    -0.0850              0
    
    [499 rows x 4 columns]