python web-scraping beautifulsoup html-parsing

Select multiple elements with BeautifulSoup and manage them individually

I am using BeautifulSoup to parse a webpage of poetry. The poetry is separated into h3 for poem title, and .line for each line of the poem. I can get both elements and add them to a list. But I want to manipulate the h3 to be uppercase and indicate a line break, then insert it into the lines list.

    linesArr = []
    for lines in full_text:
        booktitles = lines.select('h3')
        for booktitle in booktitles:
            linesArr.append(booktitle.text.upper())
            linesArr.append('')
        for line in lines.select('h3, .line'):
            linesArr.append(line.text)

This code appends all book titles to the beginning of the list, then continues getting the h3 and .line items. I have tried inserting code like this:

    linesArr = []
    for lines in full_text:
        for line in lines.select('h3, .line'):
            if line.find('h3'):
                linesArr.append(line.text.upper())
                linesArr.append('')
            else:
                linesArr.append(line.text)

Solution

I'm not sure of what you are trying to do, but here with this way you can get an array with the title in upper case and all your line:

#!/usr/bin/python3
# coding: utf8

from bs4 import BeautifulSoup
import requests

page = requests.get("https://quod.lib.umich.edu/c/cme/CT/1:1?rgn=div2;view=fulltext")
soup = BeautifulSoup(page.text, 'html.parser')

title = soup.find('h3')
full_lines = soup.find_all('div',{'class':'line'})

linesArr = []
linesArr.append(title.get_text().upper())
for line in full_lines:
    linesArr.append(line.get_text())

# Print full array with the title and text
print(linesArr)

# Print text here with line break
for linea in linesArr:
    print(linea + '\n')