Search code examples
phpmysqlajaxutf-8utf8mb4

Data is UTF-8, Ajax is returning some characters incorrectly


I have spent a lot of time searching the web and seeing a lot of similar answers, but cannot find anything that works for my situation.

I spent some time converting my MySQL database to UTF-8 from latin1 (the default). I emptied the tables (truncate), and re-imported the data from a text file. I have set the headers for my pages to use the meta tag to UTF-8:

<meta http-equiv="Content-Type" content="text/html" charset="UTF-8" />

And that's being used everywhere. When I read data via PHP and output it into forms, and such, the data appears correctly, text like: Königstadt appear as one might hope. Saving them (update set ...) they seem to be okay, because when I re-read them from the data the display in the form is correct. (The display in PHP Admin shows "Königstadt", which is weird, but when I read the data it seems to be correct ... -- I am hoping this is some oddity of PHPAdmin)

Where everything seems to be falling down on me is my Ajax code, when I retrieve data via PHP using Ajax. The following is a relatively simple routine that calls a PHP program to generate the option tags for an SELECT:

     function get_branches()
     {
        // numeric values that need to be passed to the
        // routine (so we show the correct item selected)
        var region = document.getElementById("search_region").value;
        // the code in load_branches needs this, but ...
        var branch = document.getElementById("search_branch").value;

        $.ajax
        ({
           type: "POST",
           url: "<?php echo $Roster_html_RootPath; ?>lookups/load_branches.php",
           data: {
                   'region' : region,
                   'local_branch' : branch
                 },
           //cache: false,
           success: function(data)
           {
              // load contents of DIV tag with id of branchoptions:
              $("#branch_options").html(data);
           }  // end success
        }); // end ajax call
     }; // end function get_branches()

  }); // end document.ready ...

Most of the records returned are fine. However the one shown above (Königstadt) looks like: Königstadt in the HTML Select that is returned.

I have been trying to find a solution, such as setting contentType for Ajax, here are things I have tried:

contentType: "application/x-www-form-urlencoded;charset=utf-8",

This one doesn't seem to make any difference at all. Nothing changes.

contentType: "application/text; charset=utf-8",

(or application/json) This kills the values being passed to the PHP file -- the data array doesn't seem to get there, because I get errors from PHP:

Notice: Undefined index: region in C:\xampp\htdocs\Heralds\Roster\lookups\load_branches.php on line 32

Notice: Undefined index: local_branch in C:\xampp\htdocs\Heralds\Roster\lookups\load_branches.php on line 33

I am at a complete loss how to return the values correctly. I need versions for both text or html (where I return an html table or the option tags as here), but I also need to use the json array for some of my code to return the values correctly. None of them seem to work properly with UTF-8 encoded data. I have been working on this for some time now, and am very frustrated. The explanations I am seeing are not working or in some cases not making sense ...

PHP lookups/load_branches.php

<?php
// if session has not started:
session_start();

// load some basic configuration, including relative paths
// and variables needed ...
include_once( "../includes/configuration.php" );

// data connection
include_once( $Roster_RootPath . "includes/connect.php");

// values from Ajax code:
$region       = $_POST["region"];
$local_branch = $_POST["local_branch"];

// open the roster_branches table and get list
if( $region > 0 ) // check only needed for find_by_branch.php
{
   $branch_statement = "select * from roster_branches where region=" .  $region . " order by local";
}
else
{
   $branch_statement = "select * from roster_branches order by local";
}

// first, get the data from the table:
$branch_result = mysqli_query( $connect, $branch_statement );
if( !$branch_result )
{
   $out = "";
   $out .= "<div class='alert alert-danger'>";
   $out .= "<p><b>Error in SQL statement ...</b><br />";
   $errornum = mysqli_errno( $connect );
   $out .= "MySQL Error Number: " . $errornum . "<br />";
   $out .= "MySQL Error: " . mysqli_error( $connect ) . "<br />";
   $out .= "SQL Statement: " . $branch_statement . "</p>";
   $out .= "</div>";
   echo $out;
   die;   
}
else
{
   $out = "";
   // create select:
   $out = "<select class='form-control' id='local_branch' name='local_branch'>\n";

   // need the blank option:
   $out .= "   <option value=0 selected></option>\n";

   while( $branch_row = mysqli_fetch_array( $branch_result ) )
   {
      $id = $branch_row["rb_id"];
      $local = $branch_row["local"];
      $selected = "";
      if( $local_branch == $id )
      {
         $selected = " selected";
      }
      $out .= "<option value=" . $id . $selected . ">" . $local . "</option> \n";
   }

   $out .= "</select>\n";
   echo $out;      
} // we have something

?>

Solution

  • my MySQL database to UTF-8 from latin1 (the default).


    1) I used "utf8_unicode_ci". 2) I have no idea what "multibyte safe functions" you're talking about. I use mysqli_real_escape_string() when reading data from $_POST, and the usual mysqli_query() and so on functions

    This is the cause of your issue.

    There are 3 main places to work to correct this:

    1)

    You need to enable true UTF-8 (4-byte) in MySQL so that data stored in your SQL is stored as the correct UTF-8 characters. by universally using collations and character sets with the utf8mb4_ prefix.

    2)

    To ensure data from your application/PHP is saved correctly you then need to ensure that the data is passed to MySQL as UTF-8 4-byte characters by setting the connection character set to full (4-byte) UTF-8 in your PHP:

    $mysqliObject->set_charset('utf8mb4');   // object oriented style
    mysqli_set_charset($connect, 'utf8mb4');    // procedural code style
    

    3)

    Finally; you need to ensure that any processing PHP does with the resulting data is multi-byte aware; by using the mbstring set of functions.

    Most notably:

    So your top of each PHP page should look like this before any browser output)

    mb_internal_encoding('UTF-8');
    mb_http_output('UTF-8');
    

    Then if you do anything with str_<whatever> functions (and some others) you know it won't break you strings before they're output to the browser (ajx in this case).