Search code examples
pythonhtmlbottle

HTML posted form data gets written as jibberish into MySQL database


When i try to enter data in a text field in Greek letters my wsgi script saves that data as jibberish in the MySQL database and i dont know why. Here is the relative code when the data is about to be posted via form method:

pdata = pdata + '''
<form methods="POST" enctype="multipart/form-data" action="%s">
    <tr>
            <td> <center>   <input type="text"  name="task"     size=50>    </td>
            <td> <center>   <input type="text"  name="price"    size=5>     </td>
            <td> <center>   <input type="text"  name="lastvisit">           </td>
        </table><br><br>
        <td>    <input type="image" src="/static/img/submit.gif" name="update" value="Ενημέρωση!">  </td>
    </tr>
</form>
''' % app.get_url( '/update/<name>', name=name )


pdata = pdata + "<meta http-equiv='REFRESH' content='200;%s'>" % app.get_url( '/' )
return pdata

And here is the relative callback function who tries to enter the posted-form data into MySQL database.

@app.route( '/update/<name>' )

def update( name ):

pdata = ''

task = request.query.get('task')
price = request.query.get('price')
lastvisit = request.query.get('lastvisit')


# check if date entered as intented, format it properly for MySQL
lastvisit = datetime.strptime(lastvisit, '%d %m %Y').strftime('%Y-%m-%d')


if( ( task and len(task) <= 200 ) and ( price and price.isdigit() and len(price) <= 3 ) and lastvisit != "error" ):
    # find the requested client based on its name
    cur.execute('''SELECT ID FROM clients WHERE name = %s''', name )
    clientID = cur.fetchone()[0]

    try:
        # found the client, save primary key and use it to issue hits & money UPDATE
        cur.execute('''UPDATE clients SET hits = hits + 1, money = money + %s WHERE ID = %s''', ( int(price), clientID ))

        # update client profile by adding a new record
        cur.execute('''INSERT INTO jobs (clientID, task, price, lastvisit) VALUES (%s, %s, %s, %s)''', ( clientID, task, price, lastvisit ))
    except:
        cur.rollback()

I cannot understand why the data is stored into database as jibberish instead of proper utf-8. Also trying to use utf-8 encoding type didn't work either.

<form methods="POST" enctype="utf-8" action="%s">

Solution

  • The html form data to be posted is "αυτή είναι μια δοκιμή" and the end result inside database is "αÏÏή είναι μια δοκιμή"

    However, "αυτή είναι μια δοκιμή" is apparently Invalid UTF-8 because Byte at position 38 (ή) indicates it's a two-byte UTF-8 character but only 1 byte follows (reference).

    If this is exactly the data that is passed to the code; then you need to check and confirm that the HTML form is submitting data in the correct UTF-8 format.

    <form accept-charset='UTF-8'>
    

    Assuming your input string is correctly UTF-8 encoded, your output string "αÏÏή είναι μια δοκιμή" is UTF-7 or more probably ISO-8859-1 encoded (reference).

    Therefore the issue may be the transport mechanism (as defined above; in the HTML form submission) or the database storage encoding.

    Yes MySQL Tables and Columns are configured to be utf8_general_ci.

    This may also be a problem. MySQL utf8_ is NOT full UTF-8 (wat?!) as it is 3-byte instead of 4-byte; therefore if you have a 4-byte UTF-8 character stored, it will offset all later character bytes and make them look like rubbish.

    Solution:

    Update your MySQL columns and all collations to utf8mb4_unicode_ci

    Also check to ensure your MySQL transport mechanism is using utf8mb4_ as well.

    And Please Read This