Search code examples
pythonunicodeterminalstring-formattingpython-unicode

How can I handle these weird special characters messing my print formatting?


I am printing a formatted table. But sometimes these user generated characters are taking more than one character width and it messes up the formatting as you can see in the screenshot below...

enter image description here

The width of the "title" column is formatted to be 68 bytes. But these "special characters" are taking up more than 1 character width but are only counted as 1 character. This pushes the column past its bounds.

print('{0:16s}{3:<18s}{1:68s}{2:>8n}'.format((
    ' ' + streamer['user_name'][:12] + '..') if len(streamer['user_name']) > 12 else ' ' + streamer['user_name'],
    (streamer['title'].strip()[:62] + '..') if len(streamer['title']) > 62 else streamer['title'].strip(),
    streamer['viewer_count'],
    (gamesDic[streamer['game_id']][:15] + '..') if len(gamesDic[streamer['game_id']]) > 15 else gamesDic[streamer['game_id']]))

Any advice on how to deal with these special characters?

edit: I printed the offending string to file.

🔴 𝐀𝐒𝐌𝐑 (𝙪𝙥 𝙘𝙡𝙤𝙨𝙚) ✨ LIVE 🔔 SUBS GET SNAPCHAT

edit2:

Why do these not align on a character boundary?

enter image description here

edit3:

Today the first two characters are producing weird output. But the columns are aligned in each case below.

First character in isolation...

title[0]

enter image description here

Second character in isolation... title[1]

enter image description here

First and second character together.. title[0] + title[1]

first and second character


Solution

  • I've written custom string formatter based on @snakecharmerb`s comment but still "half character width" problem persist:

    import unicodedata
    
    def fstring(string, max_length, align='l'):
        string = str(string)
        extra_length = 0
        for char in string:
            if unicodedata.east_asian_width(char) == 'F':
                extra_length += 1
    
        diff = max_length - len(string) - extra_length
        if diff > 0:
            return string + diff * ' ' if align == 'l' else diff * ' ' + string
        elif diff < 0:
            return string[:max_length-3] + '.. '
    
        return string
    
    data = [{'user_name': 'shroud', 'game_id': 'Apex Legends', 'title': 'pathfinder twitch prime loot YAYA @shroud on socials for update', 'viewer_count': 66200},
            {'user_name': 'Amouranth', 'game_id': 'ASMR', 'title': '🔴 𝐀𝐒𝐌𝐑 (𝙪𝙥 𝙘𝙡𝙤𝙨𝙚) ✨ LIVE 🔔 SUBS GET SNAPCHAT', 'viewer_count': 2261}]
    
    for d in data:
        name = fstring(d['user_name'], 20)
        game_id = fstring(d['game_id'], 15)
        title = fstring(d['title'], 62)
        count = fstring(d['viewer_count'], 10, align='r')
        print('{}{}{}{}'.format(name, game_id, title, count))
    

    It produces output:

    enter image description here

    (can't post it as a text since formatting will be lost)