Removing Specific Span Tags from a CSV file

I am trying to remove specific span tags from a csv file but my code is deleting all of them. I just need to point out certain ones to be removed for example ''. But some have '' or '' and or  that bolds the text like name<\STRONG> that I need to keep. I want to remove the font family and font-size like stated above. How can this be done with python?

import re

CLEANR = re.compile('<.*?>')


def cleanhtml(raw_html):
    cleantext = re.sub(CLEANR, '', raw_html)
    return cleantext


a_file = open("file.csv", 'r')

lines = a_file.readlines()
a_file.close()

newfile = open("file2.csv", 'w')
for line in lines:
    line = cleanhtml(line)
    newfile.write(line)
newfile.close()

Solution

If your input is always HTML string, then you could use BeautifulSoup.

Here is an example:

from bs4 import BeautifulSoup

doc = '''<span style="font-family: verdana,geneva; font-size: 10pt;"><b>xyz</b></span>'''
soup = BeautifulSoup(doc, "html.parser")
for tag in soup.recursiveChildGenerator():
    try:
        result = dict(filter(lambda elem: 'font-family' not in elem[1] and 'font-size' not in elem[1], tag.attrs.items()))
        tag.attrs = result
    except AttributeError:
        pass
print(soup)

The output:

<span><b>xyz</b></span>

So you can use this in your code like,

from bs4 import BeautifulSoup

def cleanhtml(raw_html):
    soup = BeautifulSoup(raw_html, "html.parser")
    for tag in soup.recursiveChildGenerator():
        try:
            result = dict(filter(lambda elem: 'font-family' not in elem[1] and 'font-size' not in elem[1], tag.attrs.items()))
            tag.attrs = result
        except AttributeError:
            pass
    return str(soup) #return as HTML string