Search code examples
windowspython-3.xcsvpandasodo

odo converting data between csv and mysql


When converting one of the .csv's to a pd.DataFrame using the python odo module I receive a TypeError

    TypeError: Cannot cast array from dtype('float64') to dtype('int64') 
               according to the rule 'safe'

The code that works for other csv's

# csv table file name
csvNm = 'table.csv'

# convert mysql table to csv
odo_csv = odo(tstConn.connect_string + '::' + tbl , csvNm)

# convert csv to pandas 
odo_df = odo(odo_csv , pd.DataFrame)

Here is what I tried so far to no avail:

import pandas as pd
from odo import odo, resource, discover, convert

odo_csv=odo(tstConn.connect_string + '::' + tbl , csvNm)
csv=resource(csvNm)
ds=discover(csv)

# Convert csv to pandas
odo_df = odo(odo_csv , pd.DataFrame, dshape=ds) 

and this:

odo_df = odo(odo_csv , pd.DataFrame, casting='unsafe')

Update 1 It looks like I neglected the most obvious hint in this error

pandas\parser.pyx in pandas.parser.TextReader._convert_tokens (pandas\parser.c:11816)()

Leading to encoding issues in Windows SO. But neither this:

odo_df = odo(odo_csv , pd.DataFrame, encoding=odo_csv.encoding)

or this work

odo_df = odo(odo_csv , pd.DataFrame, encoding='cp1252') 

This inelegant way (for my use-case) taken from pandas-reading-csv-files (same link as above)

# Python3
with open('/tmp/test.csv', 'r', encoding='cp1252') as f:
    df = pd.read_csv(f)
    print(df)

Not sure what to try next, any help would be appreciated.


Solution

  • The solution that works is:

    import pandas as pd
    from odo import odo, resource, discover, convert
    
    # convert mysql to csv
    odo_csv=odo(raw_dbConn.connect_string + '::' + tblName , csvNm, header=True)
    
    # Get odo resource aka sqlalchemy.Table instance
    resc=resource(raw_dbConn.connect_string + '::' + tblName )
    
    # Discover the resc
    ds=discover(resc)
    
    # Convert csv to dataframe    
    odo_df = odo(odo_csv , pd.DataFrame, dshape=ds ,encoding=odo_csv.encoding)