I am having an issue with number formats while retrieving data from Google Sheets using GSpread and comparting it with a Pandas DF populated from Twitter, using Tweepy.
Basically, when I receive data from Twitter, I have some long numbers, that are tweets ids, such as:
When I first populate my Google Sheets (using set_with_dataframe), id's are written fine, but when I get this data back from sheets to a df (using get_as_dataframe), it changes ids format to something that apparently is scientific. Number above end up like:
As you imagine, it holds me back from having unique id's, as these numbers are rounded. So I can't compare data coming from Twitter with data from Google Sheets. I tried changing style at Google Sheets, but it doesn't help, and couldn't find any reference on Pandas side. It looks like data arrives from "get_as_dataframe" with this scientific notation.
Any ideas of how I could solve this?
My suggestion for ID numbers would be to convert them to strings and keep them that way throughout the process. Unless they are expected to be numerically transformed, there's no reason to keep them as number types.
The method get_as_dataframe
in gspread-dataframe supports datatypes:
dtypeType name or dict of column -> type, optional Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.
From https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html, implied by the gspread-dataframe docs:
The get_as_dataframe function supports the keyword arguments that are supported by your Pandas version’s text parsing readers, such as pandas.read_csv.