Search code examples
pythonpython-3.xpython-docx

Python - Is there anyway to create a headline for the extracted .DOC file?


Here're my codes:

from selenium import webdriver
from bs4 import BeautifulSoup
from docx import Document
import pandas as pd
import requests

user_agent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.``3945.88 Safari/537.37"
url = "https://emetonline.org/events/past-events/"
data = requests.get(url, headers={"User-Agent": user_agent})
soup = BeautifulSoup(data.text, "lxml")

document = Document()

events = soup.find_all("div", class_="col-12")
for event in events:
    event_name = event.find("h4")
    try:
        print(event_name.text)
        document.add_paragraph(event_name.text, style='List Bullet')
    except:
        continue
        print(event_name)

document.save('demo.docx')

I want a headline for the document as Times New Roman - 14 font size. Is there anyway that I can do that?


Solution

  • from bs4 import BeautifulSoup
    from docx import Document
    from docx.shared import Pt
    import requests
    
    user_agent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.``3945.88 Safari/537.37"
    url = "https://emetonline.org/events/past-events/"
    data = requests.get(url, headers={"User-Agent": user_agent})
    soup = BeautifulSoup(data.text, "lxml")
    
    document = Document()
    
    heading = document.add_heading().add_run("Past Events")
    heading.font.name = "Times New Roman"
    heading.font.size = Pt(14)
    
    events = soup.find_all("div", class_="col-12")
    for event in events:
        event_name = event.find("h4")
        try:
            print(event_name.text)
            document.add_paragraph(event_name.text, style='List Bullet')
        except:
            continue
            print(event_name)
    
    document.save('demo.docx')
    

    I've added the extra import at the top:

    from docx.shared import Pt
    

    That is so we can set the font size to 14 point.

    I've then added this code that makes the headline:

    heading = document.add_heading().add_run("Past Events")
    heading.font.name = "Times New Roman"
    heading.font.size = Pt(14)