Search code examples
pythonreplacebeautifulsoupspecial-characters

replace \n\t from beautifulsoup


Hello I am using BeautifulSoup 4 and i try to replace the "\n\t" characters from the soup text.

Here is my code:

soup = BS(html_doc, "html.parser")
for tableItem in soup.find_all("td"):
    result = str(tableItem.string)
    result = result.replace("\n\t\", "")
    print(result)

This is my ouptut:

\n', '\t\t\t\t\t\t\t\t\t\tTEXT_I_WANT\t\t\t\t\t\t\t\t\t

I tried several things with the encoding or with the beautifulsoup "NavigableString". Do I use a wrong encoding? Or are there special methods for beautifulsoup. (such like stripped_strings)

ps: I can replace TEXT_I_WANT but not "\n" or "\t"


Solution

  • This line: result = result.replace("\n\t\", "")looks for all instances of \n\t then replaces them - it doesn't look for individual instances of \n or \t. It seems that what you want is:

    result = result.replace('\n', '')
    result = result.replace('\t', '')