Search code examples
pythonperformancecsvtinydb

Parsing large csv file to tinydb takes forever


I have a large csv file containing 15 columns and approximately 1 million rows. I want to parse the data into tinyDB. The code I use is below:

import csv
from tinydb import TinyDB

db = TinyDB('db.monitor')
table = db.table('Current')

i=0

datafile = open('newData.csv', 'rb')
data=csv.reader(datafile, delimiter = ';')

for row in data:
    table.insert({'WT_ID': row[0], 'time': row[1], 'MeanCurrent': row[2], 'VapourPressure': row[3], 'MeanVoltage':row[4], 'Temperature': row[5], 'Humidity': row[6], 'BarPressure': row[7], 'RPM': row[8], 'WindSector': row[9], 'WindSpeed': row[10], 'AirDensity': row[12], 'VoltageDC': row[13], 'PowerSec': row[14], 'FurlingAngle': row[15]})
    i=i+1
    print i

However, it really takes forever. I have set the i variable to track the progress, and while in the first lines it runs fast, now its been more than an hour and it has parsed about 10000 lines at a pace of almost 1Hz

I couldn't find anything similar so any help would be appreciated

Thank you


Solution

  • Is TinyDB the best choice ? You seem to need a transational database and TinyDB is document oriented. On top of that, from the doc : Wy not use TinyDB

    If you need advanced features or high performance, TinyDB is the wrong database for you

    Your process run really slow because you are accumulating data into the RAM. As a workaround, you could split your csv in smaller trunk and populate your script with it. This way, the memory could be clean between each iteration.

    tinyDB is quite not able to manage this amount of informations.