Search code examples
pythonlabelbeautifulsoupgtk3pygobject

How to set a BeautifulSoup.Tag as a label in Gtk3 using Python


I'm working on a program to fetch pictures from APOD web site and show its information (the part called Explanation in the site, right below the picture). I'm going to show a simplified class that implements only the part that picks the information about the picture:

import urllib2
import datetime

from BeautifulSoup import BeautifulSoup
from gi.repository import Gtk

class InfoAPOD(Gtk.Window):
    """View info about selected APOD image"""

    def __init__(self):
        """Initialize the window"""
        Gtk.Window.__init__(self)
        self.set_default_size(600, 500)
        self.set_position(Gtk.WindowPosition.CENTER)
        self.set_border_width(3)

        self.grid = Gtk.Grid()
        self.add(self.grid)

        self.scrolledwindow = Gtk.ScrolledWindow()
        self.scrolledwindow.set_vexpand(True)
        self.scrolledwindow.set_hexpand(True)
        self.grid.add(self.scrolledwindow)

        self.label = Gtk.Label()
        self.scrolledwindow.add_with_viewport(self.label)

        date = datetime.date.today()
        page = "ap" + date.strftime('%y%m%d') + ".html"
        base_url = "http://apod.nasa.gov/apod/"
        apod_url = base_url + page

        apod_htm = urllib2.urlopen(apod_url).read()
        soup = BeautifulSoup(apod_htm)
        tag_b = soup.findAll('b')
        tag_p = soup.findAll('p')

        apod_dat = date.strftime('%Y %h %d')
        apod_tit = tag_b[0].string.strip()
        apod_inf = str(tag_p[2])

        name = "APOD from " + apod_dat + " - " + apod_tit
        self.set_title(name)

        text = apod_inf.replace('<p>', '').replace('</p>', '')
        self.label.set_markup(text)
        self.label.set_justify(Gtk.Justification.LEFT)
        self.label.set_line_wrap(True)

def main():
    """Show the window"""
    win = InfoAPOD()
    win.connect('delete-event', Gtk.main_quit)
    win.show_all()
    Gtk.main()
    return 0

if __name__ == '__main__':
    main()

The problem is that the text lines are broken, and do not form a continuous paragraph (if you take a look at the APOD site you will understand what I mean). Maybe a picture is worth than thousands words:

information for APOD picture

In short, I'm using urllib2 to fetch the web page and BeautifulSoup to parse the tags, as shown in the code above. Then, I isolate the part of the information that I want (of type BeautifulSoup.Tag) and convert it in a string in order to set it as a label using markup. I've read some documentation/examples here and there about BeautifulSoup but I couldn't improve the appearance of the text.

Any advice in how to improve the appearance of text in the window is appreciated.


Solution

  • Do you want to remove the linebreaks? What about?

    text = apod_inf.replace('<p>', '').replace('</p>', '').replace('\n', '')
    

    And if you don't want to be that cluttered you can try something like:

    foo.replace('\n\n', 'SOMETOKEN').replace('\n', 'SOMETOKEN', 1).replace('\n', '', 1).replace('SOMETOKEN', '\n')