I am stuck at this second last statement clueless. The error is : numpy.core._exceptions.MemoryError: Unable to allocate 58.1 GiB for an array with shape (7791676634,) and data type int64
My thinking was that merging a data frame of ~12 million records with another data frame of 3-4 more columns should not be a big deal. Please help me out. Totally stuck here. Thanks
Select_Emp_df has around 900k records and Big_df has around 12 million records and 9 columns. I just need to merge two DFs like we do vlookup in Excel on key column.
import pandas as pd
Emp_df = pd.read_csv('New_Employee_df.csv', low_memory = False )
# Append data into one data frame from three csv files of 3 years'
transactions
df2019 = pd.read_csv('U21_02767G - Customer Trade Info2019.csv',
low_memory = False )
df2021 = pd.read_csv('U21_02767G - Customer Trade
Info2021(TillSep).csv', low_memory = False)
df2020 = pd.read_csv('Newdf2020.csv', low_memory = False)
Big_df = pd.concat([df2019, df2020, df2021], ignore_index=True)
Select_Emp_df = Emp_df[['CUSTKEY','GCIF_GENDER_DSC','SEX']]
Big_df = pd.merge(Big_df, Select_Emp_df, on='CUSTKEY')
print (Big_df.info)
Just before Big_df = pd.merge(Big_df, Select_Emp_df, on='CUSTKEY')
try to delete previous dataframes. Like this.
del df2019
del df2020
del df2021
This should save some memory
also try
Select_Emp_df = Emp_df[['CUSTKEY','GCIF_GENDER_DSC','SEX']].drop_duplicates(subset=['CUSTKEY'])