So I am practicing data wrangling and I have encountered an issue.
food['GPA'].unique()
And the output is
array(['2.4', '3.654', '3.3', '3.2', '3.5', '2.25', '3.8', '3.904', '3.4',
'3.6', '3.1', nan, '4', '2.2', '3.87', '3.7', '3.9', '2.8', '3',
'3.65', '3.89', '2.9', '3.605', '3.83', '3.292', '3.35',
'Personal ', '2.6', '3.67', '3.73', '3.79 btch', '2.71', '3.68',
'3.75', '3.92', 'Unknown', '3.77', '3.63', '3.882'], dtype=object)
My idea is to convert them to strings first and then extract the floats and integers from them. But when I run the code
food['GPA'] = food['GPA'].astype(str).str.extract('(\d*\.\d+|\d+)', expand=False)
food['GPA'] = pd.to_numeric(food['GPA'], errors='coerce')
all the values in the GPA column are being converted to 3.0 and 4.0 instead of retaining their decimal values.
food['GPA'].unique()
[3. 2. 4.]
Can anyone help me figure out why the decimals are being lost, and how to preserve them?
You need to add an r
to make a raw string so the backslashes will be interpreted correctly
food['GPA'] = food['GPA'].astype(str).str.extract(r'(\d*\.\d+|\d+)', expand=False)
food['GPA'] = pd.to_numeric(food['GPA'], errors='coerce')
print(food['GPA'].unique())