Search code examples
python-2.7beautifulsouphtml-parsing

how to duplicate output to terminal in bs4 in text file


I am using bs4 for the first time. If I use this basic code:

from bs4 import BeautifulSoup
with open ('test.txt','r') as f:
    soup = BeautifulSoup(f)
    print f

the output in the terminal is very clean and doesn't include html tags. If I try to print it to a txt file, it prompts me to add a parser, so I added the 'html.parser'. I don't get the same result, i.e. it's full of the tags I'm trying to get rid of. How can I get the same result in my txt file?

from bs4 import BeautifulSoup
with open ('test.txt','r') as f:
    soup = BeautifulSoup(f,'html.parser')
    with open ('test2.txt', 'w') as x:
        x.write(str(soup))

*EDIT Here's an example of what's in test2.txt when I run this code:

    each\u00a0row you want to accept.\n <li>At the top of the list, 
    under the <b>Batch Actions</b> drop-down arrow, 
    choose\u00a0<b>Accept Selected</b>.</li>\n <li>All the selected 
    transactions\u00a0move from the <b>For Review

but in the terminal I get:

    each\u00a0row you want to accept.\n At the top of the list, under 
    the Batch Actions drop-down arrow, choose\u00a0Accept Selected.\n 
    All the selected transactions\u00a0move from the For Review 
    tab\u00a0to the In QuickBooks 

Solution

  • try adding .text attribute

    x.write(str(soup.text))