At first, the error Can't pickle local object was given. I found a solution to use multiprocess instead of the multiprocessing library, but now the class in which the method is located is initialized as many times as processors are used. Also, the data is either not saved, or is lost and incorrect data is obtained at the output. `
import pathlib
import pandas as pd
from datetime import datetime
import csv
import re
import multiprocess as mp
class InputConect:
def __init__(self):
self.file_name = input('Введите название файла: ')
self.filter_param = input('Введите название профессии: ')
@staticmethod
def print_data(file_name, filter_param):
salary_by_years = {year: 0 for year in unique_years}
vacs_by_years = {year: 0 for year in unique_years}
vac_salary_by_years = {year: 0 for year in unique_years}
vac_counts_by_years = {year: 0 for year in unique_years}
def make_statistic(file):
#writes data to the dictionary like this:
salary_by_years[year] = int(one_year_vacancies.salary.mean())
vacs_by_years[year] = one_year_vacancies.shape[0]
if __name__ == '__main__':
with mp.Pool() as p:
mp.freeze_support()
p.map(make_statistic, filelist)
p.close()
p.join()
# m = mp.map(target=make_statistic, args=filelist)
# m.start()
# m.join()
print('Динамика уровня зарплат по годам:', salary_by_years)
print('Динамика количества вакансий по годам:', vacs_by_years)
print('Динамика уровня зарплат по годам для выбранной профессии:', vac_salary_by_years)
print('Динамика количества вакансий по годам для выбранной профессии:', vac_counts_by_years)
parameters = InputConect()
InputConect.print_data(parameters.file_name, parameters.filter_param)
` Output: Введите название файла: vacancies_by_year.csv Введите название профессии: Аналитик Введите название файла: Введите название файла: Введите название файла: Введите название файла: vacancies_by_year.csv Введите название профессии: Аналитик Введите название профессии: Аналитик Введите название профессии: Аналитик Введите название профессии: Аналитик
Динамика уровня зарплат по годам: {2007: 0, 2008: 0, 2009: 0, 2010: 0, 2011: 0, 2012: 0, 2013: 0, 2014: 0, 2015: 0, 2016: 0, 2017: 0, 2018: 0, 2019: 0, 2020: 0, 2021: 0, 2022: 0}
Динамика количества вакансий по годам: {2007: 0, 2008: 0, 2009: 0, 2010: 0, 2011: 0, 2012: 0, 2013: 0, 2014: 0, 2015: 0, 2016: 0, 2017: 0, 2018: 0, 2019: 0, 2020: 0, 2021: 0, 2022: 0}
Динамика уровня зарплат по годам для выбранной профессии: {2007: 0, 2008: 0, 2009: 0, 2010: 0, 2011: 0, 2012: 0, 2013: 0, 2014: 0, 2015: 0, 2016: 0, 2017: 0, 2018: 0, 2019: 0, 2020: 0, 2021: 0, 2022: 0}
Динамика количества вакансий по годам для выбранной профессии: {2007: 0, 2008: 0, 2009: 0, 2010: 0, 2011: 0, 2012: 0, 2013: 0, 2014: 0, 2015: 0, 2016: 0, 2017: 0, 2018: 0, 2019: 0, 2020: 0, 2021: 0, 2022: 0}
This is much too long for a comment and so:
You have several issues with your code, but mainly you still do not have a minimal, reproducible example:
filelist
in method print_data
appears to be undefined and print_data
is passed two arguments that are never referenced. This makes very little sense. In function make_statistics
, the file argument is not referenced and one_year_vacancies
is undefined. After dictionaries vac_salary_by_years
and vac_counts_by_years
are initialized they are never modified. Is that really correct?if __name__ == '__main__':
in the wrong place. This test only makes sense at module (global) scope in your main script.make_statistic
in your case, needs to be defined at module scope. Also, since this function is running in a different address space, it cannot modify the copies of salary_by_years
and vacs_by_years
that is in the main process.You have also created a class InputConnect
, but except for the __init__
method, all the other methods are static and therefore have no access to attributes file_name
and filter_param
. If you had method print_data
not a static method, that would be a different situation. It would also make your class more reusable if the class were not responsible for inputting attributes file_name
and filter_param
or printing results but instead these values were passed and returned to the class instance. This would allow the class to be used where these values are not from console input and where the output needs to go somewhere other than the console. The idea is to separate business logic for input/output if you can.
This is the general idea (but I cannot fix the undefined and unreferenced variables that you have):
import pathlib
import pandas as pd
from datetime import datetime
import csv
import re
import multiprocess as mp
def make_statistic(file):
#writes data to the dictionary like this:
#salary_by_years[year] = int(one_year_vacancies.salary.mean())
#vacs_by_years[year] = one_year_vacancies.shape[0]
# Return necessary values:
return year, int(one_year_vacancies.salary.mean()), one_year_vacancies.shape[0]
class InputConect:
def __init__(file_name, filter_param):
self.file_name = file_name
self.filter_param = filer_param
def compute(self):
salary_by_years = {year: 0 for year in unique_years}
vacs_by_years = {year: 0 for year in unique_years}
vac_salary_by_years = {year: 0 for year in unique_years}
vac_counts_by_years = {year: 0 for year in unique_years}
with mp.Pool() as p:
mp.freeze_support()
results = p.map(make_statistic, filelist)
# Process each tuple returned by `make_statistic`:
for year, value1, value2 in results: # Unpack the tuple
salary_by_years[year] = value1
vacs_by_years[year] = value2
p.close()
p.join()
# Return values rather than printing for greater reusability
return salary_by_years, vacs_by_years, vac_salary_by_years, vac_counts_by_years
if __name__ == '__main__':
file_name = input('Введите название файла: ')
filter_param = input('Введите название профессии: ')
input_connect = InputConect(file_name, filter_param)
salary_by_years, vacs_by_years, vac_salary_by_years, vac_counts_by_years = input_connect.compute()
print('Динамика уровня зарплат по годам:', salary_by_years)
print('Динамика количества вакансий по годам:', vacs_by_years)
print('Динамика уровня зарплат по годам для выбранной профессии:', vac_salary_by_years)
print('Динамика количества вакансий по годам для выбранной профессии:', vac_counts_by_years)