Search code examples
typesweaviatejsondecodeerror

Weaviate - push records in batch errors with JSONDecodeError


I’m trying to add records in a batch, but after I add my objects to the batch, I always get a JSONDecodeError when I assume the batch is being sent to my Weaviate class.

client.batch.configure(batch_size=100, dynamic=False, timeout_retries=3,
                       callback=weaviate.util.check_batch_result,
consistency_level=weaviate.data.replication.ConsistencyLevel.ALL)
with client.batch as batch:
     for el_idx, el in enumerate(send_to_weaviate):
         batch.add_data_object(el, "MyClass")

Records look like this:

send_to_weaviate[0]
{'my_id': '3c2466b7e7da201c66f42ea362874343',
 'post_timestamp': ['1644883202000', '1644883242000'],
 'dist_metric': [0, 0]}

Schema looks like this:

class_obj = {
        "class": "MyClass",
        "description": "Description",
        "properties": [{
            "dataType": ["text"],
            "description": "ID",
            "name": "my_id"
        },  {
            "dataType": ["text[]"],
            "description": "Timestamps",
            "name": "post_timestamp"
        },  {
            "dataType": ["int[]"],
            "description": "Description",
            "name": "dist_metric"
        }]
    }

Error message:

File ~/opt/anaconda3/envs/scripts/lib/python3.9/site-packages/weaviate/batch/crud_batch.py:644, in Batch._create_data(self, data_type, batch_request)
    642     connection_count += 1
    643 else:
--> 644     response_json = response.json()
    645     if (
    646         self._weaviate_error_retry is not None
    647         and batch_error_count < self._weaviate_error_retry.number_retries
    648     ):
    649         batch_to_retry, response_json_successful = self._retry_on_error(
    650             response_json, data_type
    651         )

File ~/opt/anaconda3/envs/scripts/lib/python3.9/site-packages/requests/models.py:975, in Response.json(self, **kwargs)
    971     return complexjson.loads(self.text, **kwargs)
    972 except JSONDecodeError as e:
    973     # Catch JSON-related errors and raise as requests.JSONDecodeError
    974     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 975     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

All of the Googling I've done has indicated that the JSONDecodeError occurs when your payload is not json serializable. Weaviate supports lists of common data types as properties (e.g., the int[] and text[] data types), but in general JSON doesn't like lists of ints or lists of strings in its values. However when I try to JSON serialize my send_to_weaviate variable, I have no problems, so this may not be the true cause?

import json
json.loads(json.dumps(send_to_weaviate))  # No errors

Can anyone help me figure out why my batch fails to add to Weaviate?

EDIT:: Here is a small reproducible example that reproduces my issue. I'm using weaviate-client v 3.22.1 in a python 3.9 conda environment.

import weaviate
client = weaviate.Client(URL_TO_WEAVIATE_ENDPOINT)
class_obj = {
        "class": "MyClass",
        "description": "Description",
        "properties": [{
            "dataType": ["text"],
            "description": "ID",
            "name": "my_id"
        },  {
            "dataType": ["text[]"],
            "description": "Timestamps",
            "name": "post_timestamp"
        },  {
            "dataType": ["int[]"],
            "description": "Description",
            "name": "dist_metric"
        }]
    }
client.schema.create_class(class_obj)
client.batch.configure(batch_size=100, dynamic=False, timeout_retries=3)
# Just try to add 1 doc
doc = {'my_id': '3c2466b7e7da201c66f42ea362874343','post_timestamp': ['1644883202000', '1644883242000'], 'dist_metric': [0, 0]}
with client.batch() as batch:
    batch.add_data_object(doc, "MyClass")

Solution

  • When I set batch_size = 10, things seemed to work again. I think because my records were quite character-rich, batches needed to be smaller than the default=100 setting. You can set batch_size using client.batch.configure(batch_size=10) or in the context manager itself with client.batch(batch_size=10) as batch:, etc.

    Also, weirdly, deleting the "client" object and reestablishing the connection seemed to help, but I can't understand why.