Search code examples
pythonbeautifulsoupexport-to-excelxlwt

Putting website titles into an Excel spreadsheet


I am trying to use BeautifulSoup to get a list a website titles, and put them into an Excel spreadsheet.

Text file “c:\websites.txt” contains contents below:

www.dailynews.com
www.dailynews.lk
www.dailynews.co.zw
www.gulf-daily-news.com
www.dailynews.gov.bw

The workout:

from bs4 import BeautifulSoup
import urllib2
import xlwt

list_open = open('c:\\websites.txt')
read_list = list_open.read()
line_in_list = read_list.split('\n')

for websites in line_in_list:

    url = "http://" + websites
    page = urllib2.urlopen(url)
    soup = BeautifulSoup(page.read())

    site_title = soup.find_all("title")

    print site_title

it works fine and generates the site titles. However when I add in below:

    book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
    sheet = book.add_sheet('Sheet1', cell_overwrite_ok = True)

    for cor, lmn in enumerate(line_in_list):

        sheet.write (cor, 0, site_title)

book.save("C:\\site_titles.xls")

trying to have them nicely input into the column A of an Excel spread sheet, one by one, it doesn’t work.


Solution

  • The error is that you try to save a BeautifulSoup object

    Exception: Unexpected data type <class 'bs4.element.Tag'>
    

    Try to write the text value of that object and file will be written fine

    for cor, lmn in enumerate(line_in_list):
        sheet.write (cor, 0, site_title[0].text)
    


    Write loop is wrong, try it like this: Final script:

    from bs4 import BeautifulSoup
    import urllib2
    import xlwt
    
    line_in_list = ['www.dailynews.com','www.elpais.com'] #get urls from file
    book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
    sheet = book.add_sheet('Sheet1', cell_overwrite_ok = True)
    
    for cor,websites in enumerate(line_in_list):
        url = "http://" + websites
        page = urllib2.urlopen(url)
        soup = BeautifulSoup(page.read())
        site_title = soup.find_all("title")
        print site_title
        sheet.write (cor, 0, site_title[0].text)
    
    book.save("site_titles.xls")