We are transferring a table from BigQuery to MongoDB. When we export from BQ to GCS as NDJSON, the ints are converted into strings in the output JSON. This is a problem, as we effectively lose the int type to strings...
We use pymongo's collection.update_many()
function to fix the types. We query for BigQuery metadata to get the correct types
, like so:
# connect to mongo and bigquery
bq = bigquery.Client()
cluster = MongoClient(MONGO_URI)
db = cluster["cbbap"]
# get schemas from bigquery
full_query = f"select table_name, column_name, data_type from `our_gcp_project.our_dataset.INFORMATION_SCHEMA.COLUMNS` where table_name = 'our_bq_table'"
types_df = bq.query(full_query).to_dataframe()
# create set object for 2nd parameter to update_many()
update_ints_dict = { '$set': {} };
for index, row in results_df.iterrows():
col_name = row['column_name']
if row['data_type'] == 'INT64':
update_ints_dict['$set'][col_name] = { '$toInt': f'${col_name}' }
# # update_ints_dict looks like this
# {'$set': {
# 'teamId': {'$toInt': '$teamId'},
# 'teamIdAgst': {'$toInt': '$teamIdAgst'},
# 'gameId': {'$toInt': '$gameId'},
# ... 70 more of these
# }}
# and update in mongo
my_collection = db[mongo_table]
my_collection.update_many({}, [update_ints_dict])
my_collection
is about 1GB in size, with 300K documents and, as indicated above, about 70 fields are being converted across these 300K documents. So its a relatively large update_many()
.
We received the following error message from the my_collection.update_many()
... Note that after receiving this error message, we reran the exact same code a 2nd time, and it successfully ran the 2nd time. Still, we are trying to understand why we received this error in the first place:
pymongo.errors.NotPrimaryError: Exec error resulting in state FAILURE :: caused by :: operation was interrupted,
full error: {
'writeConcernError': {
'code': 11602,
'codeName': 'InterruptedDueToReplStateChange',
'errmsg': 'operation was interrupted'
},
'operationTime': Timestamp(1628020339, 1),
'ok': 0.0,
'errmsg': 'Exec error resulting in state FAILURE :: caused by :: operation was interrupted',
'code': 11602,
'codeName': 'InterruptedDueToReplStateChange',
'$clusterTime': {
'clusterTime': Timestamp(1628020339, 1),
'signature': {
'hash': b'some-hash-string-here',
'keyId': 1234567890
}
}
}
collection.update_many()
, to make these type conversions? Perhaps there is a faster way to make type versions in the json BEFORE we upload to mongo?You had a replica set election because either 1) the current primary experienced a problem and quit working or 2) a new node was added with the highest priority and it took over the primary role from the previous primary.
To investigate and troubleshoot the first possibility, review the logs on the server that was the primary initially.
To investigate and troubleshoot the second possibility, look at your replica set configuration.
If you are using a modern MongoDB driver the error should have been automatically retried by the driver, unless you disabled retryable writes.