I'm doing a rather simple insert into a local MongoDB sourced from of a Python pandas DataFrame. Essentially I'm calling datframe.loc[n].to_dict() and getting my dictionary directly from the df. All is well so far until I attempt the insert, where I'm getting a 'Cannot encode object'. Looking at the dict directly showed that everything looked good but then (while writing this question) it dawned me to check each type in the dict and found that a long ID number had converted to a numpy.int64 instead of a simple int (which when I created the dict manually as an int would insert fine).
So, I was unable to find anything within the pandas documentation on adding arguments to the to_dict that would allow me to override this behavior and while there are brute force methods to fixing this issue, there must be a bit more eloquent way to sort this issue without resorting to that sort of thing.
Question is then, how to convert a row of a dataframe to a dict for insertion into a MongoDB, ensuring I am using only acceptable content types ... OR, can I back up further here and use a simpler approach to get each row of a dataframe to be a document within Mongo?
Thanks
As requested, here is an addendum to the post with a sample of the data I am using.
{'Account Created': 'about 3 hours ago',
'Followers': 13,
'Following': 499,
'Screen Name': 'XXXXXXXXXX',
'Status': 'Alive',
'Tweets': 12,
'Twitter ID': 0000000000L}
This directly from the to_dict output that faulted on insert. I copied this directly into a 'test' dict and that worked perfectly fine. If I print out values of each of the dicts I get the following...
to_dict = ['Alive', 'a_aheref77', 'about 3 hours ago', 12, 13, 499, 0000000000L, ObjectId('551bd8cfae89e9370851aa64')]
test = ['Alive', 'XXXXXXXX', 'about 3 hours ago', 499, 13, 12, 0000000000, ObjectId('551bd6fdae89e9370851aa63')]
The only difference (as far as I can tell) is the Long int, which interestingly enough, when I did the Mongo insert it shows that field as being 'Number Long' within the document. Hope this help clarify som.
Take a look at the odo
library. In particular, the mongodb docs. Pandas isn't likely to grow any kind of to_mongo
methods in the near future so Odo is where this sort of functionality should go. Here's an example with a simple DataFrame
:
In [13]: import pandas as pd
In [14]: from odo import odo
In [15]: df = pd.DataFrame({'a': [1, 2, 3], 'b': list('abc')})
In [17]: m = odo(df, 'mongodb://localhost/db::t')
In [18]: list(m.find())
Out[18]:
[{u'_id': ObjectId('551bfb20362e696200d568d9'), u'a': 1, u'b': u'a'},
{u'_id': ObjectId('551bfb20362e696200d568da'), u'a': 2, u'b': u'b'},
{u'_id': ObjectId('551bfb20362e696200d568db'), u'a': 3, u'b': u'c'}]
You can get the required deps and odo by doing
conda install odo pymongo --channel blaze
or
pip install odo