I would like to convert a pandas dataframe which contains space separated hex numbers into integers and floats (some columns only contain integers, some columns are floats). The dataframe has an index column (its a time variable).
The dataframe looks like this:
print(selected_df.XData)
DataSrvTime
2021-07-08T08:43:29.0616419 C7 10 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
2021-07-08T08:43:30.0866790 C2 16 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
2021-07-08T08:43:31.1107931 CB E 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
2021-07-08T08:43:32.1398927 BF 13 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
2021-07-08T08:43:33.1697282 BA 15 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
...
2021-07-08T11:12:51.1695194 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
2021-07-08T11:12:52.2000730 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
2021-07-08T11:12:53.2248873 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
2021-07-08T11:12:54.2574457 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
2021-07-08T11:12:56.3157504 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
Name: XData, Length: 7799, dtype: object
First of all I split the dataset -each into an own column. The first character of the dataframe is a delimiter. I dismissed the first column therefor to only keep the data containing columns and add a column name:
# Defining the column names:
header = ["00","01","02","03","04","05","06","07","08","09","10","11","12","13","14","15","16","17","18","19","20","21","22","23","MTF1","MTF3","MTF5","MTF7","SP","SFR","T","RH","PM_1","PM_2","PM_3","#RG","#RL","#RR","#RC","LS","Checksum"]
# split data into single columns
x_df = selected_df.XData.str.split(' ', expand=True)
# dismiss first delimiter column
x_df.drop(0, inplace=True, axis=1)
#add column names
x_df.columns = header
Now, I tried different ways to convert the hex data into integers and/or floats which all of them result in errors. Maybe anyone of you has an idea to make it better than me..
# Simply test apply solution without header yet:
res = x_df.apply(int, base = 16)
Results in this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-107-004c8b72cd04> in <module>
---> 39 res = x_df.apply(int, base = 16)
c:\python38\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwargs)
8831 kwargs=kwargs,
8832 )
-> 8833 return op.apply().__finalize__(self, method="apply")
8834
8835 def applymap(
c:\python38\lib\site-packages\pandas\core\apply.py in apply(self)
725 return self.apply_raw()
726
--> 727 return self.apply_standard()
728
729 def agg(self):
c:\python38\lib\site-packages\pandas\core\apply.py in apply_standard(self)
849
850 def apply_standard(self):
--> 851 results, res_index = self.apply_series_generator()
852
853 # wrap results
c:\python38\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
865 for i, v in enumerate(series_gen):
866 # ignore SettingWithCopy here in case the user mutates
--> 867 results[i] = self.f(v)
868 if isinstance(results[i], ABCSeries):
869 # If we have a view on v, we need to make a copy because
c:\python38\lib\site-packages\pandas\core\apply.py in f(x)
136
137 def f(x):
--> 138 return func(x, *args, **kwargs)
139
140 else:
TypeError: int() can't convert non-string with explicit base
Doing a print(x_df.dtypes)
shows that all columns are of "object" type.
I thougt that the str.split
made already strings of the splitted columns?
Then, the resulting datatypes of the first 24 columns should be integer while the datatype of the rest is float - except of the last one, which is a simple checksum.
Do I have to solve that with a loop?
Thanks for reading
I found a semi-solution (integer values only in this step):
With a for
loop I iterate through the columns (in this case I overwrite them; if not intented I could keep the original data and create a new dataframe with manipulated = x_df.copy()
)
To avoid the error I got earlier I have to add the iteration variable to specify the current column:
# convert hex data to int
for column in x_df:
x_df[column] = x_df[column].apply(int, base=16)
print(x_df)
Remark: But still this only works if there are no NaN-values somewhere in the data. NaNs have to be dropped before.