Search code examples
mysqlsqlquery-optimizationfastapibulkinsert

Why does Fast API take upwards of 10 minutes to insert 100,000 rows into a SQL database


I've tried using SqlAlchemy, as well as raw mysql.connector here, but commiting an insert into a SQL database from FastAPI takes forever.

I wanted to make sure it wasn't just my DB, so I tried it on a local script and it ran in a couple seconds.

How can I work with FastAPI to make this query possible?

Thanks!

'''

@router.post('/')
def postStockData(data:List[pydanticModels.StockPrices], raw_db = Depends(get_raw_db)):
  
    cursor = raw_db[0]
    cnxn = raw_db[1]

    # i = 0
    # for row in data:
    #   if i % 10 == 0:
    #     print(i)
    #     db.flush()
    #   i += 1
    #   db_pricing = models.StockPricing(**row.dict())
    #   db.add(db_pricing)
    # db.commit()
    SQL = "INSERT INTO " + models.StockPricing.__tablename__ + " VALUES (%s, %s, %s)"
    print(SQL)

    valsToInsert = []
    for row in data:
      rowD = row.dict()
      valsToInsert.append((rowD['date'], rowD['symbol'], rowD['value']))
    cursor.executemany(SQL, valsToInsert)
    cnxn.commit()

    return {'message':'Pricing Updated'}

'''


Solution

  • You are killing performances because you try a "RBAR" approach which is not suitable in RDBMS... You use a loop and execute an SQL INSERT of only one row... When the RDBMS is facing a query, the sequence of execution is the following :

    • does the user that throw the query be authenticate ?
    • parsing the string to verify the syntax
    • looking for metadata (tables, columns, datatypes...)
    • analyzing which operations on tables and columns this user is granted
    • creating an execution plan to sequences all the operations needed for the query
    • setting up lock for concurrency
    • executing the query (inserting only 1 row)
    • throw back an error or a OK message

    Every steps consumes time... and your are all theses steps 100 000 times because of your loop.

    Usually when inserting in a table many rows, there just one query to do even if the INSERT concerns 10000000000 rows from a file !