Search code examples
pythonjsonneo4jbatch-processingpy2neo

TypeError: not JSON serializable Py2neo Batch submit


I am creating a huge graph database with over 1.4 million nodes and 160 million relationships. My code looks as follows:

from py2neo import neo4j
# first we create all the nodes
batch = neo4j.WriteBatch(graph_db)
nodedata = []

for index, i in enumerate(words): # words is predefined
    batch.create({"term":i})
    if index%5000 == 0: #so as not to exceed the batch restrictions
        results = batch.submit()
        for x in results:
            nodedata.append(x)
        batch = neo4j.WriteBatch(graph_db)

results = batch.submit()
for x in results:
    nodedata.append(x)

#nodedata contains all the node instances now
#time to create relationships

batch = neo4j.WriteBatch(graph_db)
for iindex, i in enumerate(weightdata): #weightdata is predefined 
    batch.create((nodedata[iindex], "rel", nodedata[-iindex], {"weight": i})) #there is a different way how I decide the indexes of nodedata, but just as an example I put iindex and -iindex
    if iindex%5000 == 0: #again batch constraints
        batch.submit() #this is the line that shows error
        batch = neo4j.WriteBatch(graph_db)
batch.submit()

I am getting the following error:

Traceback (most recent call last):
  File "test.py", line 53, in <module>
    batch.submit()
  File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2116, in submit
    for response in self._submit()
  File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2085, in _submit
    for id_, request in enumerate(self.requests)
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 427, in _send
    return self._client().send(request)
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 351, in send
    rs = self._send_request(request.method, request.uri, request.body, request.$
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 326, in _send_re$
    data = json.dumps(data, separators=(",", ":"))
  File "/usr/lib64/python2.6/json/__init__.py", line 237, in dumps
    **kw).encode(obj)
  File "/usr/lib64/python2.6/json/encoder.py", line 367, in encode
    chunks = list(self.iterencode(o))
  File "/usr/lib64/python2.6/json/encoder.py", line 306, in _iterencode
    for chunk in self._iterencode_list(o, markers):
  File "/usr/lib64/python2.6/json/encoder.py", line 204, in _iterencode_list
    for chunk in self._iterencode(value, markers):
  File "/usr/lib64/python2.6/json/encoder.py", line 309, in _iterencode
    for chunk in self._iterencode_dict(o, markers):
  File "/usr/lib64/python2.6/json/encoder.py", line 275, in _iterencode_dict
    for chunk in self._iterencode(value, markers):
  File "/usr/lib64/python2.6/json/encoder.py", line 317, in _iterencode
    for chunk in self._iterencode_default(o, markers):
  File "/usr/lib64/python2.6/json/encoder.py", line 323, in _iterencode_default
    newobj = self.default(o)
  File "/usr/lib64/python2.6/json/encoder.py", line 344, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: 3448 is not JSON serializable

Could anybody please suggest me what exactly is happening here and how can I overcome it? Any kind of help would be appreciated. Thanks in advance! :)


Solution

  • It's hard to tell without being able to run your code with the same data set but this is likely to be caused by the type of the items in weightdata.

    Step through your code or print the data type as you go to determine what the type of i is within the {"weight": i} portion of the relationship descriptor. You may find that this is not an int - which would be required for JSON number serialisation. If this theory is correct, you will need to find a way to cast or otherwise convert that property value into an int before using it in a property set.