Search code examples
pythonmongodbpython-2.7pandaspymongo

Insert a Pandas Dataframe into mongodb using PyMongo


What is the quickest way to insert a pandas DataFrame into mongodb using PyMongo?

Attempts

db.myCollection.insert(df.to_dict())

gave an error

InvalidDocument: documents must have only string keys, the key was Timestamp('2013-11-23 13:31:00', tz=None)


 db.myCollection.insert(df.to_json())

gave an error

TypeError: 'str' object does not support item assignment


 db.myCollection.insert({id: df.to_json()})

gave an error

InvalidDocument: documents must have only string a keys, key was <built-in function id>


df

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 150 entries, 2013-11-23 13:31:26 to 2013-11-23 13:24:07
Data columns (total 3 columns):
amount    150  non-null values
price     150  non-null values
tid       150  non-null values
dtypes: float64(2), int64(1)

Solution

  • I doubt there is a both quickest and simple method. If you don't worry about data conversion, you can do

    >>> import json
    >>> df = pd.DataFrame.from_dict({'A': {1: datetime.datetime.now()}})
    >>> df
                               A
    1 2013-11-23 21:14:34.118531
    
    >>> records = json.loads(df.T.to_json()).values()
    >>> db.myCollection.insert(records)
    

    But in case you try to load data back, you'll get:

    >>> df = read_mongo(db, 'myCollection')
    >>> df
                         A
    0  1385241274118531000
    >>> df.dtypes
    A    int64
    dtype: object
    

    so you'll have to convert 'A' columnt back to datetimes, as well as all not int, float or str fields in your DataFrame. For this example:

    >>> df['A'] = pd.to_datetime(df['A'])
    >>> df
                               A
    0 2013-11-23 21:14:34.118531