So I scraped data and saved it as I liked, now I'm in process of cleaning. But having issue to save changes despite I creating new file with pandas. What I am missing, why it's not assigning value which I iter to original? I found some info about apply() but it haven't worked for me. Instead I used to get empty rows. A My code:
import pandas as pd
needtoclean = pd.read_excel(r'C:\Users\sound\Desktop\data.xlsx')
#making all into str, due its mixed with int,float,str
needtoclean['Salary'] = needtoclean['Salary'].astype(str)
columnSeriesObj = needtoclean['Salary']
columnSeriesObj = needtoclean['Salary']
for value in columnSeriesObj.values:
#print(value)
value = value.replace('Nuo',"").replace('Iki','')
value = value.strip()
value = re.split(r'[\s,-]+', value)
if len(value) > 1: #
value = (int(value[0])+int(value[1]))/2
else:
value = float(value[0])
value = round(value)
#print(value)
columnTitle = needtoclean['Job Title']
for value in columnTitle.values:
#print(value)
value = value.title()
value = value.replace('/', ' ').replace('(-Ė)','').replace('(-A)','').replace('(-As)', '')
value = value.strip()
value = value.split()
#print(value)
needtoclean.to_excel("datatest.xlsx")
Salary original values:
Nuo 2600
2000-2500
1487-2479
Iki 3636
1600-5700
1200-2000
And I excepted:
2600
2250
1983
3636
3650
1600
Job title:
IT infrastruktūros palaikymo skyriaus vadovas (-ė)
Bankinių kortelių skaitytuvų programuotojas (-a)
IT sistemų priežiūros skyriaus vadovas (-ė)
Produktų palaikymo skyriaus vadovas (ė) (programinė įranga)
Expected to get:
['It', 'Infrastruktūros', 'Palaikymo', 'Skyriaus', 'Vadovas']
['Bankinių', 'Kortelių', 'Skaitytuvų', 'Programuotojas']
['It', 'Sistemų', 'Priežiūros', 'Skyriaus', 'Vadovas']
['Produktų', 'Palaikymo', 'Skyriaus', 'Vadovas', '(Ė)', '(Programinė', 'Įranga)']
Just create empty list, append values in it, replace old column with new values & similarly for other column.
salary_cleaned_values = []
columnSeriesObj = needtoclean['Salary']
for value in columnSeriesObj.values:
#print(value)
value = value.replace('Nuo',"").replace('Iki','')
value = value.strip()
value = re.split(r'[\s,-]+', value)
if len(value) > 1: #
value = (int(value[0])+int(value[1]))/2
else:
value = float(value[0])
value = round(value)
salary_cleaned_values.append(value)
needtoclean['Salary'] = salary_cleaned_values