Search code examples
pythonlinuxneo4jhttplibpy2neo

How to fix IncompleteRead error on Linux using Py2Neo


I am updating data on a Neo4j server using Python (2.7.6) and Py2Neo (1.6.4). My load function is:

from py2neo import neo4j,node, rel, cypher

session = cypher.Session('http://my_neo4j_server.com.mine:7474')

def load_data():  
    tx = session.create_transaction()
    for row in dataframe.iterrows():  #dataframe is a pandas dataframe 
        name = row[1].name
        id = row[1].id

        merge_query = "MERGE (a:label {name:'%s', name_var:'%s'}) " % (id, name)        
        tx.append(merge_query)
    tx.commit()     

When I execute this from Spyder in Windows it works great. All the data from the dataframe is committed to neo4j and visible in the graph. However, when I run this from a linux server (different from the neo4j server) I get the following error at tx.commit(). Note that I have the same version of python and py2neo.

INFO:py2neo.packages.httpstream.http:>>> POST http://neo4j1.qs:7474/db/data/transaction/commit [1360120]
INFO:py2neo.packages.httpstream.http:<<< 200 OK [chunked]
ERROR:__main__:some part of process failed
Traceback (most recent call last):
  File "my_file.py", line 132, in load_data
    tx.commit()
  File "/usr/local/lib/python2.7/site-packages/py2neo/cypher.py", line 242, in commit
    return self._post(self._commit or self._begin_commit)
  File "/usr/local/lib/python2.7/site-packages/py2neo/cypher.py", line 208, in _post
    j = rs.json
  File "/usr/local/lib/python2.7/site-packages/py2neo/packages/httpstream/http.py", line 563, in json
    return json.loads(self.read().decode(self.encoding))
  File "/usr/local/lib/python2.7/site-packages/py2neo/packages/httpstream/http.py", line 634, in read
    data = self._response.read()
  File "/usr/local/lib/python2.7/httplib.py", line 543, in read
    return self._read_chunked(amt)
  File "/usr/local/lib/python2.7/httplib.py", line 597, in _read_chunked
    raise IncompleteRead(''.join(value))
IncompleteRead: IncompleteRead(128135 bytes read)

This post (IncompleteRead using httplib) suggests that is an httplib error. I am not sure how to handle since I am not calling httplib directly.

Any suggestions for getting this load to work on Linux or what the IncompleteRead error message means?

UPDATE : The IncompleteRead error is being caused by a Neo4j error being returned. The line returned in _read_chunked that is causing the error is:

pe}"}]}],"errors":[{"code":"Neo.TransientError.Network.UnknownFailure"

Neo4j docs say this is an unknown network error.


Solution

  • Although I can't say for sure, this implies some kind of local network issue between client and server rather than a bug within the library. Py2neo wraps httplib (which is pretty solid itself) and, from the stack trace, it looks as though the client is expecting more chunks from a chunked response.

    To diagnose further, you could make some curl calls from your Linux application server to your database server and see what succeeds and what doesn't. If that works, try writing a quick and dirty python script to make the same calls with httplib directly.

    UPDATE 1: Given the update above and the fact that the server streams its responses, I'm thinking that the chunk size might represent the intended payload but the error cuts the response short. Recreating the issue with curl certainly seems like the best next step to help determine whether it is a fault in the driver, the server or something else.

    UPDATE 2: Looking again this morning, I notice that you're using Python substitution for the properties within the MERGE statement. As good practice, you should use parameter substitution at the Cypher level:

    merge_query = "MERGE (a:label {name:{name}, name_var:{name_var}})"
    merge_params = {"name": id, "name_var": name}
    tx.append(merge_query, merge_params)