Search code examples
pythonmongodbpymongo

Inserting data using PyMongo based on a defined data model


I have a dataset consisting of 250 rows that looks like to following:

enter image description here

In MongoDB Compass, I inserted the first row as follows:

db.employees.insertOne([{"employee_id": 412153, 
                        "first_name": "Carrol", 
                        "last_name": "Dhin", 
                        "email": "carrol.dhin@company.com", 
                        "managing": [{"manager_id": 412153, "employee_id": 174543}], 
                        "department": [{"department_name": "Accounting", "department_budget": 500000}], 
                        "laptop": [{"serial_number": "CSS49745", 
                                    "manufacturer": "Lenovo", 
                                    "model": "X1 Gen 10", 
                                    "date_assigned": {$date: 01-15-2022}, 
                                    "installed_software": ["MS Office", "Adobe Acrobat", "Slack"]}]})

If I wanted to insert all 250 rows into the database using PyMongo in Python, how would I ensure that every row is entered following the format that I used when I inserted it manually in the Mongo shell?


Solution

  • from pymongo import MongoClient
    import pandas as pd
    
    client = MongoClient(‘localhost’, 27017)
    db = client.MD
    collection = db.gammaCorp
    
    df = pd.read_csv(‘ ’) #insert CSV name here
    
    data = {}
    
    for i in df.index:
        data['employee_id'] = df['employee_id'][i]
        data['first_name'] = df['first_name'][i]
        data['last_name'] = df['last_name'][i]
        data['email'] = df['email'][i]
        data['managing'] = [{'manager_id': df['employee_id'][i]}, {'employee_id': df['managing'][i]}]
        data['department'] = [{'department_name': df['department'][i]}, {'department_budget': df['department_budget'][i]}]
        data['laptop'] = [{'serial_number': df['serial_number'][i]}, {'manufacturer': df['manufacturer'][i]}, {'model': df['model'][i]}, {'date_assigned': df['date_assigned'][i]}, {'installed_software': df['installed_software'][i]}]
        
        collection.insert_one(data)