Search code examples
phpmysqljoomlacharacter-encoding

Can't find the good encoding for inserting articles in joomla db


I'm creating a script to migrate an old website into joomla. On my old website all php scripts are in utf8. The import script too.

To create the articles I do that:

    $article = JTable::getInstance('content');

    $article->title            = $titre;
    $article->alias            = $alias;
    $article->introtext        = $contenu;
    $article->catid            = $idcat;
    $article->created          = JFactory::getDate()->toSQL();;
    $article->created_by_alias = 'Import';
    $article->state            = 1;
    $article->access           = 1;
    $article->metadesc           = $description;
    $article->metadata         = '{"page_title":"'.$titre.'","author":"","robots":""}';
    $article->language         = '*';

    if (!$article->check())
        print $article->getError();

    if (!$article->store(TRUE))
        print $article->getError();

Everything is good after that but I have a second pass that does that:

$query = $db->getQuery(true);
$query->select("id,introtext");
$query->from("#__content");
$query->where("1");
$db->setQuery((string) $query);
$messages = $db->loadObjectList();

foreach($messages as $page)
{
    $idarticle=$page->id;

    $dom = new DOMDocument;
    @$dom->loadHTML(utf8_decode($page->introtext));
    ...
    $fields = array("introtext=".$db->quote(utf8_encode($dom->saveHTML())));
    $conditions = array("id='$idarticle'");
    $query->update('#__content')->set($fields)->where($conditions);
    $db->setQuery($query);
    $result = $db->execute();
}

I tried with or without utf8_decode/utf8_encode and it is the same: some characters are replaced with ? for example but accented characters are good.


Solution

  • I found a solution, not sure if it can cause problems: Before the article creation a convert special characters this way:

    $html=htmlentities($html,ENT_NOQUOTES|ENT_SUBSTITUTE|ENT_DISALLOWED);
    $html=str_replace("&lt;","<",$html);
    $html=str_replace("&gt;",">",$html);