Search code examples
pythonmongoengine

How to bulk transfer JSON via MongoEngine


Trying to send a lot of data to MongoDB through MongoEngine. I start with a DataFrame that I write to JSON like this:

result = df.to_json(orient="index")
parsed = json.loads(result)
json_data = json.dumps(parsed, indent=4) 

I then make it a little prettier using this:

json_object = json.loads(json_data)
json_formatted_str = json.dumps(json_object, indent=2)
print(json_formatted_str)

This is the result:

{
  "0": {
    "Address": " Bursvej 30 ",
    "Zip/city": "4930 Maribo",
    "Price": " 148.000kr. ",
    "Date": 1673371545635
  },
  "1": {
    "Address": " Garrdesmuttevej 20 ",
    "Zip/city": "9550 Mariager",
    "Price": " 148.000kr. ",
    "Date": 1673371545635
  },
  "2": {
    "Address": " Norrevej 21 ",
    "Zip/city": "6990 Ulfborg",
    "Price": " 150.000kr. ",
    "Date": 1673371545635
  },

But when i try to send it to Mongo:

MD = [MarketData(**data) for data in json_formatted_str]
MarketData.objects.insert(MD, load_bulk=False)

I get this error: TypeError: main.MarketData() argument after ** must be a mapping, not str

Is there any other way to do this? I have been trying with PyMongo for several hours but gave up. Should I go back to that? Would prefer MongoEngine to be honest.

Thanks in advance

EDIT>

Dataframe

    Address Zip/city    Price   Date
0   Bursøvej 30, Bursø  4930 Maribo 148.000 kr. 2023-01-10 17:25:45.635483
1   Gærdesmuttevej 20   9550 Mariager   148.000 kr. 2023-01-10 17:25:45.635483
2   Nørrevej 21 6990 Ulfborg    150.000 kr. 2023-01-10 17:25:45.635483
3   Egernvænget 54  4733 Tappernøje 195.000 kr. 2023-01-10 17:25:45.635483
4   Egernvænget 56  4733 Tappernøje 195.000 kr. 2023-01-10 17:25:45.635483

And my schema

class MarketData(Document):
    #answers = DictField()
    Address = DynamicField(required=False)
    city = DynamicField(required=False)
    Price = DynamicField(required=False)
    date = DynamicField(required=False)
    
    def json(self):
        market_dict = {
            "username": self.username,
            "city": self.city,
            "Price": self.price
        }
        return json.dumps(market_dict)

Solution

  • You probably want to bypass all the conversion of this data to and from a string. To that end, let's just convert to a dictionary. Of course you could also just iterate over the rows of your dataframe as well but let's start with:

    MD = [
        MarketData(**data)
        for data
        in df.to_dict(orient="records")
    ]