Code:
import socket, feedparser
feed = feedparser.parse("http://pwnmyi.com/feed")
latest = feed.entries[0]
art_name = latest.title
network = 'irc.rizon.net'
port = 6667
irc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
irc.connect((network, port))
print irc.recv(4096)
irc.send('NICK PwnBot\r\n')
irc.send('USER PwnBot PwnBot PwnBot :PwnBot by Fike\r\n')
irc.send('JOIN #pwnmyi\r\n')
while True:
data = irc.recv(4096)
if data.find('PING') != -1:
irc.send('PONG ' + data.split() [1] + '\r\n')
if data.find( '!latest' ) != -1:
irc.send('PRIVMSG #pwnmyi :Latest Article: ' + art_name + '\r\n')
It connects etc., but then when I do !latest in the channel, it just quits with this:
irc.send('PRIVMSG #pwnmyi :Latest Article: ' + art_name + '\r\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 55: ordinal not in range(128)
Could you please help me debug this code? It used to work for me before.
the IRC protocol does not define a particular character set encoding used for messages, rather it's an 8bit protocol, which has certain octets used for control characters. (See rfc1459 section 2.2.
Apparently the popular mIRC client will decode utf8 sequences if it recognizes them as such, and this makes pretty decent sense for irc's use since ascii codepoints are encoded with the same bytes as the ascii characters, and non-ascii codepoints are all encoded as values > 127.
In python, that's spelled unicode.encode(encoding='utf8')
like so:
>>> u'\u0ca0_\u0ca0'.encode('utf8')
'\xe0\xb2\xa0_\xe0\xb2\xa0'