Search code examples
phptelegramtelegram-botphp-telegram-bot

Telegram php bot emoji entities lenght and offset problem


I have this telegram bot written in php, when the bot receives a $text it performs several transformations.

Now I'm trying to catch embedded urls through entities and return clear urls in the text. However when the text contains emojis or special chars, the leght and offset of the entities are wrong. I've read here that the problem is that telegram and php do a different count due to different charset.

So I've tried this code but with no luck, sometimes it works (but probably it's casually) other times it grabs earlier or later text than expected.

$admin = ''; // Your user id goes here
$update = file_get_contents('php://input');
$update = json_decode($update, TRUE);

if (!empty($update['message']))
{
    $message = $update['message'];
    if(!empty($message["text"])){$text = $message['text'];}
    if (!empty($message["entities"])) {
    foreach ($message["entities"] as $entity) {
            if (!empty($entity["type"]) && $entity["type"] === "text_link") {
                    $url = $entity["url"];
        }
    }
}

$links = array();

if (!empty($message["entities"])) {
    foreach ($message["entities"] as $entity) {
        if ($entity["type"] === "text_link") {
            $urls[] = $entity["url"];
        if (preg_match("/(https?:\/\/(?:www\.)?(?:amazon\.[a-z\.]+|amzn\.to+|amzn\.eu)\/[^\s]+)/i", $entity["url"], $matches)) {
                $links[] = array(
                        "url" => $entity["url"],
                        "offset" => $entity["offset"],
                        "length" => $entity["length"]
                    );
            }
        }
    }
}

$text = htmlspecialchars($text, ENT_QUOTES);

if (!empty($links)) {
        foreach ($links as $link) {
            $url = $link["url"];
            $offset = $link["offset"];
            $length = $link["length"];
        
            // Estrai il testo originale dal testo completo utilizzando offset e length
            $originalText = mb_substr($text, $offset, $length);

        // Replace the original text with the Amazon link
        $text = str_replace($originalText, $url, $text);
        }
    }

inviaMessaggio($admin, $text, null, "true");


function inviaMessaggio($chat_id, $text, $tastiera, $anteprima = "true")
    {
        $args = [];
        $args['chat_id'] = $chat_id;
        $args['text'] = $text;
        $args['parse_mode'] = "HTML";
        $args['disable_web_page_preview'] = $anteprima;
        $rm = null;
        if (!empty($tastiera)) 
        {
            $rm = json_encode(['inline_keyboard' => $tastiera]);
        }
        $args['reply_markup'] = $rm;
        //return curlRequest($GLOBALS['website'].'/sendMessage', $args);
    // Invia il messaggio
    $response = curlRequest($GLOBALS['website'].'/sendMessage', $args);

    // Estrai l'ID del messaggio dal JSON di risposta
    $messageData = json_decode($response, true);
    if (isset($messageData['result'])) {
        $messageId = $messageData['result']['message_id'];

        // Restituisci l'ID del messaggio
        return ['response' => $response, 'message_id' => $messageId];
    }
    }

Solution

  • I wrote this class to decode telegram entities and convert them into telegram formatting text with PHP when I faced a similar problem. You can use this class to convert the text and use regexp to extract links from HTML tags.