We know that MongoDB has two ways of modeling relationships between
relations/entities, namely, embedding and referencing (see difference here). Let's say we have a USER
database with two tables in mySQL named user
and address
. An embedded MongoDB document might look like this:
{
"_id": 1,
"name": "Ashley Peacock",
"addresses": [
{
"address_line_1": "10 Downing Street",
"address_line_2": "Westminster",
"city": "London",
"postal_code": "SW1A 2AA"
},
{
"address_line_1": "221B Baker Street",
"address_line_2": "Marylebone",
"city": "London",
"postal_code": "NW1 6XE"
}
]
}
Whereas in a referenced relation, 2 SQL tables will make 2 collections in MongoDB which can be migrated by this apporoach using pymongo
.
How can we directly migrate MySQL data as an embedded document using python?
Insights about about Pseudo code and performance of algorithm will be highly useful. Something that comes to my mind is creating views
by performing joins
in MySQL. But in that case we will not be having the structure of children document inside a parent document.
First, for canonical reference, the question of "embedded" vs. "referenced" data is called denormalization. Mongo has a guide describing when you should denormalize. Knowing when and how to denormalize is a very common hang-up when moving from SQL to NoSQL and getting it wrong can erase any performance benefits you might be looking for. I'll assume you've got this figured out since you seem set on using an embedded approach.
Mongo has a great Python tutorial you may want to refer to. First join your user
and address
tables. It will look something like this:
| _id | name | address_line_1 | address_line_2 | ...
| 1 | Ashley Peacock | 10 Downing Street ... | ...
| 1 | Ashley Peacock | 221B Baker Street ... | ...
| 2 | Bob Jensen | 343 Main Street ... | ...
| 2 | Bob Jensen | 1223 Some Ave ... | ...
...
Then iterate over the rows to create your documents and pass them to pymongo
insert_one
. Using upsert=True
with insert_one
will insert a new document if a matching one is not found in the database, or update an existing document if it is found. Using $push
appends the address
data to the array field addresses
in the document. With this setup, insert_one
will automatically handle duplicates and append addresses based on matching _id
fields. See the docs for more details:
from pymongo import MongoClient
client = MongoClient(port=27017)
db = client.my_db
sql_data = [] # should have your SQL table data
# depending on how you got this into python, you will index with a
# field name or a number, e.g. row["id"] or row[0]
for row in sql_data:
address = {
"address_line_1": row["address_line_1"],
"address_line_2": row["address_line_2"],
"city": row["city"],
"postal_code": row["postal_code"],
}
db.users.update_one(
{"_id": row["_id"]},
{"name": row["name"], "$push": {"addresses": address}},
upsert=True,
)