Search code examples
utf-8tcleggdrop

Trouble converting utf-8 characters to html entities through eggdrop


To firstly get this out of the way.. I have already recompiled eggdrop under the utf-8 encoding. I am able to echo out utf-8 characters if I write the escape sequences in strings(\u00a7), but for some reason I cannot yet figure out.. I am unable to compare them using regex to their counterparts.

I am attempting to develop a logging script based on eggdrop, coded under tcl. I've already spent a few hours doing nothing but research, but either there isn't any help out there, or I'm looking in the wrong places.

An input string, §, is typed by a user in an irc channel that the bot is on. The logging script, linux side, interprets this character as a special control character(I -think-), and it renders in gedit as a two-lined special character whose appearance looks like 'FFA7', with the FF on the first line and the A7 on the second, all enclosed in a box.

My regex is quite simple:

regexp -all {\u00a7} $text

I have of course also tried:

regexp -all {\247} $text

Unfortunately, as already stated.. it does not work. I get a 0 every time, meaning it never matches the character.

For all the research I've done, I've been unable to figure out what format eggdrop is sending the strings in to the tcl script. The only thing -has- worked is to copy that box-like character from gedit directly into the script.. but given I cannot replicate this character otherwise, it becomes rather impossible to code.

To the question.. I'm wondering what I'm doing wrong here, if there is in fact a correct way to interpret the string sent by the bot so I can correctly convert the special characters in it to html entities.


Solution

  • For those that are wondering, it would appear from my testing that I've solved this with the simple:

    set text [encoding convertfrom utf-8 $text]
    

    And my other functions work for replacing the escape sequences as they should. I don't know how I missed this earlier in my research.