Is there a way to find class name and take the whole text of parent tag?

I have a lot of html files and I have to take the full header of files. Tags of headers located differently: class="c6", class="c7"

I have tried BeautifulSoup

for head_c6 in soup.find_all('span', attrs={'class': 'c6'}):
        print(head_c6.get_text())
for head_c7 in soup.find_all('span', attrs={'class': 'c7'}):
        print(head_c7.get_text())

but the result:

Q3 2017 American Express Co Earnings Call - Final LENGTH:

Q2 2016 Akamai Technologies Inc Call - Final Earnings

Here how different files look like:

File 1

<div class="c4">
<p class="c5">
<span class="c6">
      Q3 2017 American Express Co Earnings Call - Final
     </span>
</p>
</div>
<div class="c4">
<p class="c5">
<span class="c7">
      LENGTH:
     </span>
<span class="c2">
      11051 words
     </span>
</p>
</div>

File 2

<div class="c4">
<p class="c5">
<span class="c6">
      Q2 2018 Akamai Technologies Inc
     </span>
<span class="c7">
      Earnings
     </span>
<span class="c6">
      Call - Final
     </span>
</p>
</div>

File 3

<div class="c4">
    <p class="c5">
     <span class="c6">
      Q4 2018
     </span>
     <span class="c7">
      Facebook
     </span>
     <span class="c6">
      Inc
     </span>
     <span class="c7">
      Earnings
     </span>
     <span class="c6">
      Call - Final
     </span>
    </p>

What I want is get full text of header:

Q3 2017 American Express Co Earnings Call - Final

Q2 2018 Akamai Technologies Inc Earnings Call - Final

Q4 2018 Facebook Inc Earnings Call - Final

Solution

Use Regular expression re I have updated the last file html.You can do it same with remaining files

from bs4 import BeautifulSoup
import re
data='''<div class="c4">
    <p class="c5">
     <span class="c6">
      Q4 2018
     </span>
     <span class="c7">
      Facebook
     </span>
     <span class="c6">
      Inc
     </span>
     <span class="c7">
      Earnings
     </span>
     <span class="c6">
      Call - Final
     </span>
    </p>'''

soup=BeautifulSoup(data,'html.parser')

items=[item.text.strip() for item in soup.find_all('span', class_=re.compile("c"))]
stritem=' '.join(items)
print(stritem.replace('\n',''))

Output:

 Q4 2018 Facebook Inc Earnings Call - Final

You can also use following way.

items=[item.text.strip() for item in soup.find_all('span', class_=re.compile("c6|c7"))]
stritem=' '.join(items)
print(stritem.replace('\n',''))

or to get the parent tag text try that.

from bs4 import BeautifulSoup
import re
data='''<div class="c4">
    <p class="c5">
     <span class="c6">
      Q4 2018
     </span>
     <span class="c7">
      Facebook
     </span>
     <span class="c6">
      Inc
     </span>
     <span class="c7">
      Earnings
     </span>
     <span class="c6">
      Call - Final
     </span>
    </p>'''

soup=BeautifulSoup(data,'html.parser')
childtag=soup.find('span', class_=re.compile("c6|c7"))
parenttag=childtag.parent
print(parenttag.text.replace('\n',''))