I have the following dataframe in pandas
need to insert all value into a datawarehouse with chinese characters but chinese characters are instered as junk (?????) (百å¨è‹±åšï¼ˆèˆŸå±±ï¼‰å•¤é…’有é™å…¬å¸ ) like above one The insert query is prepared dynamically. I need help on how to handle the following scenerio:
Read file as UTF-8 and writte into a datawarehouse using pyodbc connection using character set UTF-8.
df=pd.read_csv(filename,dtype='str',encoding='UTF-8')
cnxn = database_connect() ##Connect to database##
cnxn.setencoding(ctype=pyodbc.SQL_CHAR, encoding='UTF-8')
cnxn.autocommit = True
cursor = cnxn.cursor()
for y in range(len(df)):
inst='insert into '+tablename+' values ('
for x in range(len(clm)):
if str(df.iloc[y,x])=='nan':
df.iloc[y,x]=''
if x!=len(clm)-1:
inst_val=inst_val+"'"+str(df.iloc[y,x]).strip().replace("'",'')+"'"+","
else:
inst_val=inst_val+"'"+str(df.iloc[y,x]).strip().replace("'",'')+"'"+")"
inst=inst+inst_val #########prepare insert statment from values inside in-memory data###########
inst_val=''
print("Inserting value into table")
try:
cursor.execute(inst) ##########Execute insert statement##############
print("1 row inserted")
except Exception as e:
print (inst)
print (e)
same like value should inserted into sql datawarehouse
You are using dynamic SQL to construct string literals containing Chinese characters, but you are creating them as
insert into tablename values ('你好')
when SQL Server expects Unicode string literals to be of the form
insert into tablename values (N'你好')
You would be better off to use a proper parameterized query to avoid such issues:
sql = "insert into tablename values (?)"
params = ('你好',)
cursor.execute(sql, params)