Search code examples
pythonpandascsvtextdataset

How to read and turn a 13Gbite text in python?


I have a 1.5Gbite rar file that when I open it it gives me 13 Gbite .txt file and I can't load it with anything. I used:

import pandas as pd
read_file = pd.read_csv (r'Path where the Text file is stored\File name.txt', sep='\t')
read_file.to_csv (r'Path where the CSV will be saved\File name.csv', index=None)

I used it in Pycharm and it takes forever and gives

MemoryError

I used colab in it crashed and I think it's because my Drive was almost full because I only have 15G of drive. What should I do?


Solution

  • The chunksize parameter specifies the number of rows per chunk.

    chunksize = 10 ** 6
    with pd.read_csv(filename, chunksize=chunksize, on_bad_lines='skip') as reader:
        for chunk in reader:
            # you can use chunk to process the data in parts