Search code examples
pythonjsonmongodbbson

bson.errors.InvalidDocument: key '$numberDecimal' must not start with '$' when using json


I have a small json file, with the following lines:

{
    "IdTitulo": "Jaws",
    "IdDirector": "Steven Spielberg",
    "IdNumber": 8,
    "IdDecimal": "2.33"
}

An there is a schema in my db collection, named test_dec. This is what I've used to create the schema:

db.createCollection("test_dec",
{validator: {
    $jsonSchema: {
         bsonType: "object",
         required: ["IdTitulo","IdDirector"],
         properties: {
         IdTitulo: {
                "bsonType": "string",
                "description": "string type, nombre de la pelicula"
            },
         IdDirector: {
                "bsonType": "string",
                "description": "string type, nombre del director"
            },
        IdNumber : {
                "bsonType": "int",
                "description": "number type to test"
            },
        IdDecimal : {
                 "bsonType": "decimal",
                 "description": "decimal type"
                    }
       }
    }}
    })

I've made multiple attempts to insert the data. The problem is in the IdDecimal field value.

Some of the trials, replacing the IdDecimal line by:

 "IdDecimal": 2.33

 "IdDecimal": {"$numberDecimal": "2.33"}

 "IdDecimal": NumberDecimal("2.33")

None of them work. The second one is the formal solution provided by MongoDB manuals (mongodb-extended-json) adn the error is the output I've placed in my question: bson.errors.InvalidDocument: key'$numberDecimal' must not start with '$'.

I am currently using a python to load the json. I've been playing around with this file:

import os,sys
import re
import io
import json
from pymongo import MongoClient
from bson.raw_bson import RawBSONDocument
from bson.json_util import CANONICAL_JSON_OPTIONS,dumps,loads
import bsonjs as bs

#connection
client = MongoClient('localhost',27018,document_class=RawBSONDocument)
db     = client['myDB']
coll   = db['test_dec']   
other_col = db['free']                                                                                        

for fname in os.listdir('/mnt/win/load'):                                                                               
    num = re.findall("\d+", fname)

    if num:

       with io.open(fname, encoding="ISO-8859-1") as f:

            doc_data = loads(dumps(f,json_options=CANONICAL_JSON_OPTIONS))

            print(doc_data) 

            test = '{"idTitulo":"La pelicula","idRelease":2019}'
            raw_bson = bs.loads(test)
            load_raw = RawBSONDocument(raw_bson)

            db.other_col.insert_one(load_raw)


client.close()

I am using a json file. If I try to parse anything like Decimal128('2.33') the output is "ValueError: No JSON object could be decoded", because my json has an invalid format.

The result of

    db.other_col.insert_one(load_raw) 

Is that the content of "test" is inserted. But I cannot use doc_data with RawBSONDocument, because it goes like that. It says:

  TypeError: unpack_from() argument 1 must be string or buffer, not list:

When I manage to parse the json directly to the RawBSONDocument I got all the trash within and the record in database looks like the sample here:

   {
    "_id" : ObjectId("5eb2920a34eea737626667c2"),
    "0" : "{\n",
    "1" : "\t\"IdTitulo\": \"Gremlins\",\n",
    "2" : "\t\"IdDirector\": \"Joe Dante\",\n",
    "3" : "\t\"IdNumber\": 6,\n",
    "4" : "\"IdDate\": {\"$date\": \"2010-06-18T:00.12:00Z\"}\t\n",
    "5" : "}\n"
     }

It seems it is not that simple to load a extended json into MongoDB. The extended version is because I want to use schema validation.

Oleg pointed out that is numberDecimal and not NumberDecimal as I had it before. I've fixed the json file, but nothing changed.

Executed:

with io.open(fname, encoding="ISO-8859-1") as f:
      doc_data = json.load(f)                
      coll.insert(doc_data)

And the json file:

 {
    "IdTitulo": "Gremlins",
    "IdDirector": "Joe Dante",
    "IdNumber": 6,
    "IdDecimal": {"$numberDecimal": "3.45"}
 }

Solution

  • One more roll of the dice from me. If you are using schema validation as you are, I would recommend defining a class and being explicit with defining each field and how you propose to convert the field to the relevant python datatypes. While your solution is generic, the data structure has to be rigid to match the validation.

    IMO this is clearer and you have control over any errors etc within the class.

    Just to confirm I ran the schema validation and this works with the supplied validation.

    from pymongo import MongoClient
    import bson.json_util
    import dateutil.parser
    import json
    
    class Film:
        def __init__(self, file):
            data = file.read()
            loaded = json.loads(data)
            self.IdTitulo  = loaded.get('IdTitulo')
            self.IdDirector = loaded.get('IdDirector')
            self.IdDecimal = bson.json_util.Decimal128(loaded.get('IdDecimal'))
            self.IdNumber = int(loaded.get('IdNumber'))
            self.IdDateTime = dateutil.parser.parse(loaded.get('IdDateTime'))
    
        def insert_one(self, collection):
            collection.insert_one(self.__dict__)
    
    client = MongoClient()
    mycollection = client.mydatabase.test_dec
    
    with open('c:/temp/1.json', 'r') as jfile:
        film = Film(jfile)
        film.insert_one(mycollection)
    

    gives:

    > db.test_dec.findOne()
    {
            "_id" : ObjectId("5eba79eabf951a15d32843ae"),
            "IdTitulo" : "Jaws",
            "IdDirector" : "Steven Spielberg",
            "IdDecimal" : NumberDecimal("2.33"),
            "IdNumber" : 8,
            "IdDateTime" : ISODate("2020-05-12T10:08:21Z")
    }
    

    >

    JSON file used:

    {
        "IdTitulo": "Jaws",
        "IdDirector": "Steven Spielberg",
        "IdNumber": 8,
        "IdDecimal": "2.33",
        "IdDateTime": "2020-05-12T11:08:21+0100"
    }