I am trying to extract only numbers and only strings in two different dataframes. I am using regular expression to extract numbers and string.
import pandas as pd
df_num = pd.DataFrame({
'Colors': ['lila1.5', 'rosa2.5', 'gelb3.5', 'grün4', 'rot5', 'schwarz6', 'grau7', 'weiß8', 'braun9', 'hellblau10'],
'Animals': ['hu11nd', '12welpe', '13katze', 's14chlange', 'vo15gel', '16papagei', 'ku17h', '18ziege', '19pferd',
'esel20']
})
for column in df_num.columns:
df_num[column] = df_num[column].str.extract('(\d+)').astype(float)
print(df_num)
I have also tried using '([\d+][\d+\.\d+])' and '([\d+\.\d+])'
Here I am getting output but not what I am expecting. Though I am expecting float numbers I am not getting 1.5 or 2.5.
I am getting something like below image:
df_str = pd.DataFrame({
'Colors': ['lila1.5', 'rosa2.5', 'gelb3', 'grün4', 'rot5', 'schwarz6', 'grau7', 'weiß8', 'braun9', 'hellblau10'],
'Animals': ['hu11nd', '12welpe', '13katze', 's14chlange', 'vo15gel', '16papagei', 'ku17h', '18ziege', '19pferd',
'esel20']
})
for column in df_str.columns:
df_str[column] = df_str[column].str.extract('([a-zA-Z]+)')
print(df_str)
In this case when the number is at the end or in the beginning then I am getting the string but if the number placed in the middle or any other place then the result which I expect I am not getting. Current output is like below image:
I think my regular expression is not correct. Which will be the right regular expression to solve these problems? Or is there any other way to extract only numbers and only strings in pandas dataframe?
Your code is on the right track, you just need to account for the decimals and the possibility of integers :
df_num['colors_num'] = df_num.Colors.str.extract(r'(\d+[.\d]*)')
df_num['animals_num'] = df_num.Animals.str.extract(r'(\d+[.\d]*)')
df_num['colors_str'] = df_num.Colors.str.replace(r'(\d+[.\d]*)','')
df_num['animals_text'] = df_num.Animals.str.replace(r'(\d+[.\d]*)','')
Colors Animals colors_num animals_num colors_str animals_text
0 lila1.5 hu11nd 1.5 11 lila hund
1 rosa2.5 12welpe 2.5 12 rosa welpe
2 gelb3.5 13katze 3.5 13 gelb katze
3 grün4 s14chlange 4 14 grün schlange
4 rot5 vo15gel 5 15 rot vogel
5 schwarz6 16papagei 6 16 schwarz papagei
6 grau7 ku17h 7 17 grau kuh
7 weiß8 18ziege 8 18 weiß ziege
8 braun9 19pferd 9 19 braun pferd
9 hellblau10 esel20 10 20 hellblau esel