Search code examples
pythonhtmltagssave-as

Python: How can I save the content of several html files, into a html link, from <title> tag?


I have a Python code that works fine for parsing some data on html files. At the end of the code I must save the html files by tag. For example, I have this 3 html files with 3 titles tags:

<title>My name is Prince</title>
<title>I love Madonna</title>
<title>Cars and Candies</title>

Each of them must be save like this:

my-name-is-prince.html
I-love-madonna.html
cars-and-candies.html

So, I already have some SAVE solution for Python, but I don't know how to save by tag.

try:
    title = re.search('<title.+/title>', html)[0]
    title_content = re.search('>(.+)<', title)[1]
    except:
    pass


with open("my-words.html", "w") as some_file_handle:
    some_file_handle.write(finalString)

OR

with open('page_323.txt', 'w') as f:
    f.write(result.text) 

OR

with open("somefilename.txt", "w") as some_file_handle:  
    for line in data: 
        some_file_handle.write(line + "\n")

P.S. I have 500 files. The Python code must find each tag from each html and save each of them into new html.


Solution

  • Update

    Are you looking for that:

    # html = """<title>My name is Prince</title>"""
    
    >>> re.search(r'<title>(?P<title>.+)</title>', html).groups('title')[0] \
          .replace(' ', '-').lower()
    
    'my-name-is-prince'
    

    Old answer If you already extract title from html you can do:

    title = 'My name is Prince'
    filename = f"{title.lower().replace(' ', '-')}.html"
    
    with open(filename, "w") as some_file_handle:
        some_file_handle.write(finalString)