When i try to enter data in a text field in Greek letters my wsgi script saves that data as jibberish in the MySQL database and i dont know why. Here is the relative code when the data is about to be posted via form method:
pdata = pdata + '''
<form methods="POST" enctype="multipart/form-data" action="%s">
<tr>
<td> <center> <input type="text" name="task" size=50> </td>
<td> <center> <input type="text" name="price" size=5> </td>
<td> <center> <input type="text" name="lastvisit"> </td>
</table><br><br>
<td> <input type="image" src="/static/img/submit.gif" name="update" value="Ενημέρωση!"> </td>
</tr>
</form>
''' % app.get_url( '/update/<name>', name=name )
pdata = pdata + "<meta http-equiv='REFRESH' content='200;%s'>" % app.get_url( '/' )
return pdata
And here is the relative callback function who tries to enter the posted-form data into MySQL database.
@app.route( '/update/<name>' )
def update( name ):
pdata = ''
task = request.query.get('task')
price = request.query.get('price')
lastvisit = request.query.get('lastvisit')
# check if date entered as intented, format it properly for MySQL
lastvisit = datetime.strptime(lastvisit, '%d %m %Y').strftime('%Y-%m-%d')
if( ( task and len(task) <= 200 ) and ( price and price.isdigit() and len(price) <= 3 ) and lastvisit != "error" ):
# find the requested client based on its name
cur.execute('''SELECT ID FROM clients WHERE name = %s''', name )
clientID = cur.fetchone()[0]
try:
# found the client, save primary key and use it to issue hits & money UPDATE
cur.execute('''UPDATE clients SET hits = hits + 1, money = money + %s WHERE ID = %s''', ( int(price), clientID ))
# update client profile by adding a new record
cur.execute('''INSERT INTO jobs (clientID, task, price, lastvisit) VALUES (%s, %s, %s, %s)''', ( clientID, task, price, lastvisit ))
except:
cur.rollback()
I cannot understand why the data is stored into database as jibberish instead of proper utf-8. Also trying to use utf-8 encoding type didn't work either.
<form methods="POST" enctype="utf-8" action="%s">
The html form data to be posted is "αυτή είναι μια δοκιμή" and the end result inside database is "αÏÏή είναι μια δοκιμή"
However, "αυτή είναι μια δοκιμή" is apparently Invalid UTF-8 because
Byte at position 38 (ή
) indicates it's a two-byte UTF-8 character but only 1 byte follows (reference).
If this is exactly the data that is passed to the code; then you need to check and confirm that the HTML form is submitting data in the correct UTF-8 format.
<form accept-charset='UTF-8'>
Assuming your input string is correctly UTF-8 encoded, your output string "αÏÏή είναι μια δοκιμή" is UTF-7 or more probably ISO-8859-1 encoded (reference).
Therefore the issue may be the transport mechanism (as defined above; in the HTML form submission) or the database storage encoding.
Yes MySQL Tables and Columns are configured to be
utf8_general_ci
.
This may also be a problem. MySQL utf8_
is NOT full UTF-8 (wat?!) as it is 3-byte instead of 4-byte; therefore if you have a 4-byte UTF-8 character stored, it will offset all later character bytes and make them look like rubbish.
Update your MySQL columns and all collations to utf8mb4_unicode_ci
Also check to ensure your MySQL transport mechanism is using utf8mb4_
as well.