I'm creating a script to migrate an old website into joomla. On my old website all php scripts are in utf8. The import script too.
To create the articles I do that:
$article = JTable::getInstance('content');
$article->title = $titre;
$article->alias = $alias;
$article->introtext = $contenu;
$article->catid = $idcat;
$article->created = JFactory::getDate()->toSQL();;
$article->created_by_alias = 'Import';
$article->state = 1;
$article->access = 1;
$article->metadesc = $description;
$article->metadata = '{"page_title":"'.$titre.'","author":"","robots":""}';
$article->language = '*';
if (!$article->check())
print $article->getError();
if (!$article->store(TRUE))
print $article->getError();
Everything is good after that but I have a second pass that does that:
$query = $db->getQuery(true);
$query->select("id,introtext");
$query->from("#__content");
$query->where("1");
$db->setQuery((string) $query);
$messages = $db->loadObjectList();
foreach($messages as $page)
{
$idarticle=$page->id;
$dom = new DOMDocument;
@$dom->loadHTML(utf8_decode($page->introtext));
...
$fields = array("introtext=".$db->quote(utf8_encode($dom->saveHTML())));
$conditions = array("id='$idarticle'");
$query->update('#__content')->set($fields)->where($conditions);
$db->setQuery($query);
$result = $db->execute();
}
I tried with or without utf8_decode
/utf8_encode
and it is the same: some characters are replaced with ?
for example ’
but accented characters are good.
I found a solution, not sure if it can cause problems: Before the article creation a convert special characters this way:
$html=htmlentities($html,ENT_NOQUOTES|ENT_SUBSTITUTE|ENT_DISALLOWED);
$html=str_replace("<","<",$html);
$html=str_replace(">",">",$html);