I'm struggling with the pandarell library.
Here is what I'm doing:
def ponerfecha(row):
import datetime
a = datetime.datetime(2023, 9, 10, row['HORA'], row['MINUTO'])
return a
CargaT['FECHATRX'] = CargaT.parallel_apply(lambda row: ponerfecha(row), axis=1)
It is not working. I'm getting the follow error: NameError: name 'ponerfecha' is not defined
Example:
data = {'HORA': [10, 12, 15],
'MINUTO': [30, 45, 0]}
CargaT = pd.DataFrame(data)
Expected output:
HORA MINUTO FECHATRX
0 10 30 2023-09-10 10:30:00
1 12 45 2023-09-10 12:45:00
2 15 0 2023-09-10 15:00:00
Any clue of what I'm doing wrong? Without parallel it works perfectly.
A complete example that's working for me:
import pandas as pd
from pandarallel import pandarallel
pandarallel.initialize(progress_bar=True)
def ponerfecha(row):
import datetime
a = datetime.datetime(2023, 9, 10, row['HORA'], row['MINUTO'])
return a
data = {'HORA': [10, 12, 15],
'MINUTO': [30, 45, 0]}
CargaT = pd.DataFrame(data)
CargaT['FECHATRX'] = CargaT.parallel_apply(lambda row: ponerfecha(row), axis=1)
print(CargaT)
Prints:
INFO: Pandarallel will run on 8 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.
100.00% :::::::::::::::::::::::::::::::::::::::: | 1 / 1 |
100.00% :::::::::::::::::::::::::::::::::::::::: | 1 / 1 |
100.00% :::::::::::::::::::::::::::::::::::::::: | 1 / 1 |
HORA MINUTO FECHATRX
0 10 30 2023-09-10 10:30:00
1 12 45 2023-09-10 12:45:00
2 15 0 2023-09-10 15:00:00