I am creating a huge graph database with over 1.4 million nodes and 160 million relationships. My code looks as follows:
from py2neo import neo4j
# first we create all the nodes
batch = neo4j.WriteBatch(graph_db)
nodedata = []
for index, i in enumerate(words): # words is predefined
batch.create({"term":i})
if index%5000 == 0: #so as not to exceed the batch restrictions
results = batch.submit()
for x in results:
nodedata.append(x)
batch = neo4j.WriteBatch(graph_db)
results = batch.submit()
for x in results:
nodedata.append(x)
#nodedata contains all the node instances now
#time to create relationships
batch = neo4j.WriteBatch(graph_db)
for iindex, i in enumerate(weightdata): #weightdata is predefined
batch.create((nodedata[iindex], "rel", nodedata[-iindex], {"weight": i})) #there is a different way how I decide the indexes of nodedata, but just as an example I put iindex and -iindex
if iindex%5000 == 0: #again batch constraints
batch.submit() #this is the line that shows error
batch = neo4j.WriteBatch(graph_db)
batch.submit()
I am getting the following error:
Traceback (most recent call last):
File "test.py", line 53, in <module>
batch.submit()
File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2116, in submit
for response in self._submit()
File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2085, in _submit
for id_, request in enumerate(self.requests)
File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 427, in _send
return self._client().send(request)
File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 351, in send
rs = self._send_request(request.method, request.uri, request.body, request.$
File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 326, in _send_re$
data = json.dumps(data, separators=(",", ":"))
File "/usr/lib64/python2.6/json/__init__.py", line 237, in dumps
**kw).encode(obj)
File "/usr/lib64/python2.6/json/encoder.py", line 367, in encode
chunks = list(self.iterencode(o))
File "/usr/lib64/python2.6/json/encoder.py", line 306, in _iterencode
for chunk in self._iterencode_list(o, markers):
File "/usr/lib64/python2.6/json/encoder.py", line 204, in _iterencode_list
for chunk in self._iterencode(value, markers):
File "/usr/lib64/python2.6/json/encoder.py", line 309, in _iterencode
for chunk in self._iterencode_dict(o, markers):
File "/usr/lib64/python2.6/json/encoder.py", line 275, in _iterencode_dict
for chunk in self._iterencode(value, markers):
File "/usr/lib64/python2.6/json/encoder.py", line 317, in _iterencode
for chunk in self._iterencode_default(o, markers):
File "/usr/lib64/python2.6/json/encoder.py", line 323, in _iterencode_default
newobj = self.default(o)
File "/usr/lib64/python2.6/json/encoder.py", line 344, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: 3448 is not JSON serializable
Could anybody please suggest me what exactly is happening here and how can I overcome it? Any kind of help would be appreciated. Thanks in advance! :)
It's hard to tell without being able to run your code with the same data set but this is likely to be caused by the type of the items in weightdata
.
Step through your code or print the data type as you go to determine what the type of i
is within the {"weight": i}
portion of the relationship descriptor. You may find that this is not an int
- which would be required for JSON number serialisation. If this theory is correct, you will need to find a way to cast or otherwise convert that property value into an int
before using it in a property set.