Still quite new to this whole Python thing. Here's what I'm trying to do:
Extracting words from txt file using python
SORT of like that, but instead of taking out words between single quotes, I need to take words out of double quotes FOLLOWING a certain word.
Right now, I have the script scraping a website and saving the HTML. Works great. No problem. Then I have BeautifulSoup arranging the HTML and searching for all the tables in the page which the data I need is in. Below is an example of one of the table lines:
<td style="background-color:red;w ...blahblahblah... margin:0px;background:none" title="Bland NB" type="button" value="TRX"/>
BeautifulSoup arranges all the HTML as one table line per line (if that makes sense) and I have a search using regex to only pull out the table lines that have "background-color:red" in them as the red ones are the only ones I care about getting the titles of. I just need the script to go through line by line (there are ~350 lines just like above but with different titles) and take out what's in quotes right after 'title=' and save all of it to a text file one "title=" entry per line if you know what I mean...
I think BeautifulSoup might be able to do it. I've been wrestling with partition and strip functions but can't get them to do what I want them to do. I also think I might be able to use a regex to do it but that's a can o' worms in and of itself.
I'm so close! Any help greatly appreciated!!
Thanks!!
EDIT
I can't post more of the code as it contains company IPs and info that I can't put out in the wild. Sorry.
--Brent
html = """
<td style="background-color:red;w ...blahblahblah... margin:0px;background:none" title="Bland NB" type="button" value="TRX"/>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
if "background-color:red" in td.get("style"):
print soup.td.get("title")
Bland NB
To put it all together:
soup = BeautifulSoup(html)
all_tds = soup.findAll("td")
with open("out.txt","a+") as f:
for td in all_tds:
if "background-color:red" in soup.td.get("style"):
f.write(soup.td.get("title")+"\n")