Search code examples
pythondata-ingestionmlopsmlrunfeature-store

MLRun ingestion, ConnectionResetError 10054


I got this error during ingest data to Parquet in MLRun CE:

2023-09-03 14:01:47,327 [error] Unhandled exception while sending request: {'e': <class 'ConnectionResetError'>, 'e_msg': ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None), 'connection': <http.client.HTTPSConnection object at 0x000002183B39F100>}

See part of call stack:

File "C:\Python\qgate-sln-mlrun\venv\lib\site-packages\v3io\dataplane\transport\httpclient.py", line 160, in _send_request_on_connection
    connection.request(request.method, path, request.body, request.headers)
  File "C:\Users\jist\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1285, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\jist\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1331, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Users\jist\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1280, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Users\jist\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1079, in _send_output
    self.send(chunk)
  File "C:\Users\jist\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1001, in send
    self.sock.sendall(data)
  File "C:\Users\jist\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 1204, in sendall
    v = self.send(byte_view[count:])
  File "C:\Users\jist\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 1173, in send
    return self._sslobj.write(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

Did you solve the same issue?


Solution

  • I got it, that happened when ingested data was bigger that default limit for ParquetTarget (default limit is max_events= 10.000). You have two options, how you can solve it:

    1] Change default setting for ParquetTarget (increase value max_events)

    target = mlrun.datastore.targets.ParquetTarget(name="test", path="t01", max_events=500000)
    

    or

    2] Ingest data with chunks, see sample with 1k rows

    for data_frm in pd.read_csv(file, chunksize=1000):
        fstore.ingest(featureset,data_frm,overwrite=False)