Search code examples
pythonwords

Extracting a word from text file using Python


Still quite new to this whole Python thing. Here's what I'm trying to do:

Extracting words from txt file using python

SORT of like that, but instead of taking out words between single quotes, I need to take words out of double quotes FOLLOWING a certain word.

Right now, I have the script scraping a website and saving the HTML. Works great. No problem. Then I have BeautifulSoup arranging the HTML and searching for all the tables in the page which the data I need is in. Below is an example of one of the table lines:

<td style="background-color:red;w ...blahblahblah... margin:0px;background:none" title="Bland NB" type="button" value="TRX"/>

BeautifulSoup arranges all the HTML as one table line per line (if that makes sense) and I have a search using regex to only pull out the table lines that have "background-color:red" in them as the red ones are the only ones I care about getting the titles of. I just need the script to go through line by line (there are ~350 lines just like above but with different titles) and take out what's in quotes right after 'title=' and save all of it to a text file one "title=" entry per line if you know what I mean...

I think BeautifulSoup might be able to do it. I've been wrestling with partition and strip functions but can't get them to do what I want them to do. I also think I might be able to use a regex to do it but that's a can o' worms in and of itself.

I'm so close! Any help greatly appreciated!!

Thanks!!

EDIT

I can't post more of the code as it contains company IPs and info that I can't put out in the wild. Sorry.

--Brent


Solution

  • html = """
    <td style="background-color:red;w ...blahblahblah... margin:0px;background:none" title="Bland NB" type="button" value="TRX"/>
    """
    
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(html)
    
    if "background-color:red" in td.get("style"):
        print soup.td.get("title")
        Bland NB
    

    To put it all together:

    soup = BeautifulSoup(html)
    
    all_tds = soup.findAll("td")
    
    with open("out.txt","a+") as f:
        for td in all_tds:
            if "background-color:red" in soup.td.get("style"):
                f.write(soup.td.get("title")+"\n")