I wish to use the cp437 character map from the utf-8 encoding.
I have all the code points for each of the cp437 characters.
The following code correctly displays a single cp437 character:
import locale
locale.setlocale(locale.LC_ALL, '')
icon u'\u263A'.encode('utf-8')
print icon
Whereas the following code shows most of the cp437 characters, but not all:
for i in range(0x00,0x100):
print chr(i).decode('cp437')
My guess is that the 2nd approach is not referencing the utf-8 encoding, but a separate incomplete cp437 character set.
I would like a way to summon a cp437 character from the utf-8 without having to specify each of the 256 individual code points. I have resorted to manually typing the unicode code point strings in a massive 16x16 table. Is there a better way?
The following code demonstrates this:
from curses import *
import locale
locale.setlocale(locale.LC_ALL, '')
def main(stdscr):
maxyx = stdscr.getmaxyx()
text= str(maxyx)
y_mid=maxyx[0]//2
x_mid=maxyx[1]//2
next_y,next_x = y_mid, x_mid
curs_set(1)
noecho()
event=1
y=0; x=0
icon1=u'\u2302'.encode('utf-8')
icon2=chr(0x7F).decode('cp437')
while event !=ord('q'):
stdscr.addstr(y_mid,x_mid-10,icon1)
stdscr.addstr(y_mid,x_mid+10,icon2)
event = stdscr.getch()
wrapper(main)
The icon on left is from utf-8 and does print to screen. The icon on the right is from decode('cp437') and does not print to screen correctly [appears as ^?]
As mentioned by @Martijn in the comments, the stock cp437
decoder converts characters 0-127 straight into their ASCII equivalents. For some applications this would be the right thing, as you wouldn't for example want '\n'
to translate to u'\u25d9'
. But for full fidelity to the code page, you need a custom decoder and encoder.
The codec
module makes it easy to add your own codecs, but examples are hard to find. I used the sample at http://pymotw.com/2/codecs/ along with the Wikipedia table for Code page 437 to generate this module - it automatically registers a codec with the name 'cp437ex'
when you import it.
import codecs
codec_name = 'cp437ex'
_table = u'\0\u263a\u263b\u2665\u2666\u2663\u2660\u2022\u25d8\u25cb\u25d9\u2642\u2640\u266a\u266b\u263c\u25ba\u25c4\u2195\u203c\xb6\xa7\u25ac\u21a8\u2191\u2193\u2192\u2190\u221f\u2194\u25b2\u25bc !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u2302\xc7\xfc\xe9\xe2\xe4\xe0\xe5\xe7\xea\xeb\xe8\xef\xee\xec\xc4\xc5\xc9\xe6\xc6\xf4\xf6\xf2\xfb\xf9\xff\xd6\xdc\xa2\xa3\xa5\u20a7\u0192\xe1\xed\xf3\xfa\xf1\xd1\xaa\xba\xbf\u2310\xac\xbd\xbc\xa1\xab\xbb\u2591\u2592\u2593\u2502\u2524\u2561\u2562\u2556\u2555\u2563\u2551\u2557\u255d\u255c\u255b\u2510\u2514\u2534\u252c\u251c\u2500\u253c\u255e\u255f\u255a\u2554\u2569\u2566\u2560\u2550\u256c\u2567\u2568\u2564\u2565\u2559\u2558\u2552\u2553\u256b\u256a\u2518\u250c\u2588\u2584\u258c\u2590\u2580\u03b1\xdf\u0393\u03c0\u03a3\u03c3\xb5\u03c4\u03a6\u0398\u03a9\u03b4\u221e\u03c6\u03b5\u2229\u2261\xb1\u2265\u2264\u2320\u2321\xf7\u2248\xb0\u2219\xb7\u221a\u207f\xb2\u25a0\xa0'
decoding_map = { i: ord(ch) for i, ch in enumerate(_table) }
encoding_map = codecs.make_encoding_map(decoding_map)
class Codec(codecs.Codec):
def encode(self, input, errors='strict'):
return codecs.charmap_encode(input, errors, encoding_map)
def decode(self, input, errors='strict'):
return codecs.charmap_decode(input, errors, decoding_map)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input, self.errors, encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input, self.errors, decoding_map)[0]
class StreamReader(Codec, codecs.StreamReader):
pass
class StreamWriter(Codec, codecs.StreamWriter):
pass
def _register(encoding):
if encoding == codec_name:
return codecs.CodecInfo(
name=codec_name,
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter)
codecs.register(_register)
Also note that decode
produces Unicode strings, while encode
produces byte strings. Printing a Unicode string should always work, but your question indicates you may have an incorrect default encoding. One of these should work:
icon2='\x7f'.decode('cp437ex')
icon2='\x7f'.decode('cp437ex').encode('utf-8')