Search code examples
pythonhtmlnltk

How can i extract the words which are starting with "icon" from HTML code using python


I need a python code to extract the selected word using python.

<a class="tel ttel">
<span class="mobilesv icon-hg"></span>
<span class="mobilesv icon-rq"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-rq"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-ikj"></span>
<span class="mobilesv icon-dc"></span>
<span class="mobilesv icon-acb"></span>
<span class="mobilesv icon-lk"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-nm"></span>
<span class="mobilesv icon-ba"></span>
<span class="mobilesv icon-yz"></span>
</a>

I need to extract the words which start with the "icon"

The Output which I required is

icon-hg, icon-rq, icon-ba, icon-rq, icon-ba, icon-ikj, icon-dc, icon-acb, icon-lk, icon-ba, icon-nm, icon-ba, icon-yz


Solution

  • For your specific case you can get it as below, however i recommend using beautiful soup for working with wide problems, remember, Special cases aren't special enough to break the rules.

    text = """
    <a class="tel ttel">
    <span class="mobilesv icon-hg"></span>
    <span class="mobilesv icon-rq"></span>
    <span class="mobilesv icon-ba"></span>
    <span class="mobilesv icon-rq"></span>
    <span class="mobilesv icon-ba"></span>
    <span class="mobilesv icon-ikj"></span>
    <span class="mobilesv icon-dc"></span>
    <span class="mobilesv icon-acb"></span>
    <span class="mobilesv icon-lk"></span>
    <span class="mobilesv icon-ba"></span>
    <span class="mobilesv icon-nm"></span>
    <span class="mobilesv icon-ba"></span>
    <span class="mobilesv icon-yz"></span>
    </a>
    """
    
    result = [word.split('"')[0] for word in text.split() if word.startswith('icon')]
    
    print(result)
    

    output:

    ['icon-hg', 'icon-rq', 'icon-ba', 'icon-rq', 'icon-ba', 'icon-ikj', 'icon-dc', 'icon-acb', 'icon-lk', 'icon-ba', 'icon-nm', 'icon-ba', 'icon-yz']