Search code examples
pythonpandasout-of-memory

Memory efficient alternative to str.replace()


I have a csv file with 200k rows and about 40 columns. Specific column contains special character '|' that i want to replace with '_'. However while doing str.replace and then .append i encounter OOM error on my 16GB RAM, there must be a more efficient way.

My code:

import os
import pandas as pd
import numpy as np

archive_loc = ('pathname')
data = pd.read_csv(os.path.join(archive_loc,'sample.csv'))

category = data['category'].values
category = category.tolist()

for string in category:
     new_string = string.replace("|", "_")
     category.append(new_string)

Solution

  • Don't convert to a list and loop, do the replacement directly in the dataframe.

    data['category'] = data['category'].str.replace('|', '_', regex=False)