I am hoping someone can help me optimize the following Python/Pandas code. My code works, but I know there must be a cleaner and faster way to perform the operation under consideration.
I am looking for an optimized strategy because my use case will involve 16 unique ADC Types, as opposed to 4 in the example below. Also, my initial Pandas Series (i.e. ADC Type column), will be several 100,000 data points in length, rather than 8 in the example below.
import numpy as np
import pandas as pd
from enum import Enum
data_dict = {"RAW": [4000076160, 5354368, 4641792, 4289860736,
4136386944, 5440384, 4772864, 4289881216],
"ADC_TYPE": [3, 7, 8, 9,
3, 7, 8, 9]}
df = pd.DataFrame(data_dict)
print(df)
The initial DataFrame (i.e. df) is:
RAW ADC_TYPE
0 4000076160 3
1 5354368 7
2 4641792 8
3 4289860736 9
4 4136386944 3
5 5440384 7
6 4772864 8
7 4289881216 9
I then manipulate the DataFrame above using the following code:
unique_types = df["ADC_TYPE"].unique()
dict_concat = {"RAW": [],
"ADC_TYPE_3": [],
"ADC_TYPE_7": [],
"ADC_TYPE_8": [],
"ADC_TYPE_9": []}
df_concat = pd.DataFrame(dict_concat)
for adc_type in unique_types:
df_group = df.groupby(["ADC_TYPE"]).get_group(adc_type).rename(columns={"ADC_TYPE": f"ADC_TYPE_{adc_type}"})
df_concat = pd.concat([df_concat, df_group])
print(df_concat.sort_index())
The returned DataFrame (i.e. df_concat) is displayed below. The ordering of RAW and the associated ADC Type values must remain unchanged. I need the return DataFrame to look just like the DataFrame below.
RAW ADC_TYPE_3 ADC_TYPE_7 ADC_TYPE_8 ADC_TYPE_9
0 4.000076e+09 3.0 NaN NaN NaN
1 5.354368e+06 NaN 7.0 NaN NaN
2 4.641792e+06 NaN NaN 8.0 NaN
3 4.289861e+09 NaN NaN NaN 9.0
4 4.136387e+09 3.0 NaN NaN NaN
5 5.440384e+06 NaN 7.0 NaN NaN
6 4.772864e+06 NaN NaN 8.0 NaN
7 4.289881e+09 NaN NaN NaN 9.0
I liked the idea of using get_dummies
, so I modified it a bit:
df = (pd.get_dummies(df, 'ADC_TYPE', '_', columns=['ADC_TYPE'])
.replace(1, np.nan)
.apply(lambda x: x.fillna(df['ADC_TYPE']))
.replace(0, np.nan))
Output:
RAW ADC_TYPE_3 ADC_TYPE_7 ADC_TYPE_8 ADC_TYPE_9
0 4000076160 3.0 NaN NaN NaN
1 5354368 NaN 7.0 NaN NaN
2 4641792 NaN NaN 8.0 NaN
3 4289860736 NaN NaN NaN 9.0
4 4136386944 3.0 NaN NaN NaN
5 5440384 NaN 7.0 NaN NaN
6 4772864 NaN NaN 8.0 NaN
7 4289881216 NaN NaN NaN 9.0