Search code examples
pythonpandasazurepypyodbc

Chinese character insert issue


I have the following dataframe in pandas

enter image description here

need to insert all value into a datawarehouse with chinese characters but chinese characters are instered as junk (?????) (百å¨è‹±åšï¼ˆèˆŸå±±ï¼‰å•¤é…’有é™å…¬å¸ ) like above one The insert query is prepared dynamically. I need help on how to handle the following scenerio:

Read file as UTF-8 and writte into a datawarehouse using pyodbc connection using character set UTF-8.

df=pd.read_csv(filename,dtype='str',encoding='UTF-8')
cnxn = database_connect() ##Connect to database##
cnxn.setencoding(ctype=pyodbc.SQL_CHAR, encoding='UTF-8')
cnxn.autocommit = True
cursor = cnxn.cursor()
for y in range(len(df)):
 inst='insert into '+tablename+' values ('
 for x in range(len(clm)):
  if str(df.iloc[y,x])=='nan':
   df.iloc[y,x]=''
  if x!=len(clm)-1:
   inst_val=inst_val+"'"+str(df.iloc[y,x]).strip().replace("'",'')+"'"+","
  else:
   inst_val=inst_val+"'"+str(df.iloc[y,x]).strip().replace("'",'')+"'"+")"
 inst=inst+inst_val #########prepare insert statment from values inside in-memory data###########
 inst_val=''
 print("Inserting value into table")
 try:
  cursor.execute(inst) ##########Execute insert statement##############
  print("1 row inserted")
 except Exception as e:
  print (inst)
  print (e)

same like value should inserted into sql datawarehouse


Solution

  • You are using dynamic SQL to construct string literals containing Chinese characters, but you are creating them as

    insert into tablename values ('你好')
    

    when SQL Server expects Unicode string literals to be of the form

    insert into tablename values (N'你好')
    

    You would be better off to use a proper parameterized query to avoid such issues:

    sql = "insert into tablename values (?)"
    params = ('你好',)
    cursor.execute(sql, params)