I currently have a Weaviate database that contains 12 schemas and another which contains 10 schemas. Is it possible to just take the data for the 10 schemas and put them into the other database? If so, what is the best way to do so?
Yes, you can export all collections from one database schema and import them into another Weaviate database. (A bit of terminology: the Weaviate schema contains class definitions, and a collection consists of all objects class of the same class.) Since you haven't mentioned a specific programming language, here's a solution in pseudocode / Python.
First you need to initialize the clients for the source and target databases, say source_db_client
and target_db_client
.
Then you need to get the schema of the source database (the one with 10 collections in your case).
schema = source_db_client.schema.get()
Then for each class in the schema (for c in schema['classes']
),
class_def = source_db_client.schema.get(c['class']) # the class name
target_db_client.schema.create_class(class_def)
Python code could look like this:
import weaviate
source_db_client = weaviate.Client('http://localhost:8080')
target_db_client = weaviate.Client('http://localhost:8081')
batch_size = 100
schema = source_db_client.schema.get()
for c in schema['classes']:
class_name = c['class']
class_def = source_db_client.schema.get(class_name)
target_db_client.schema.create_class(class_def)
# Skip copying cross-reference properties
class_properties = [prop['name'] for prop in class_def['properties'] if prop['dataType'] not in [['crossReferencedClass1'], ['crossReferencedClass2'], ...]]
cursor = None
with target_db_client.batch(batch_size=batch_size) as batch:
# Batch import all objects to the target instance
while True:
# From the SOURCE instance, get the next group of objects
query = (
source_db_client.query.get(class_name, class_properties)
.with_additional(['id vector'])
.with_limit(batch_size)
)
if cursor is not None:
query = query.with_after(cursor)
results = query.do()
if 'errors' in results:
raise Exception(results['errors'])
# If empty, we're finished
if len(results['data']['Get'][class_name]) == 0:
break
# Otherwise, add the objects to the batch to be added to the target instance
for retrieved_object in results['data']['Get'][class_name]:
new_object = dict()
for prop in class_properties:
new_object[prop] = retrieved_object[prop]
target_db_client.batch.add_data_object(
new_object,
class_name=class_name,
vector=retrieved_object['_additional']['vector']
)
# Update the cursor to the id of the last retrieved object
cursor = results['data']['Get'][class_name][-1]['_additional']['id']
You can find equivalent TypeScript code in the Weaviate documentation under How-to: Manage data -> Read all objects -> Restore to a target instance.