Search code examples
pythonbeautifulsoupreplacetrimstrip

Why this replace(), re.sub() or strip() do not work with this string?


I'm using BeautifulSoup to get a result from a webpage. I've transformed the data object to string and I'm not being able to trim it.

I've got the following string:

text = '\n\n\n This product is not available.\n \n'

I've tried three options to start removing the newline character:

  1. string=text.replace('\n','')

  2. string=text.strip('\n')

import re
string = re.sub('\n','', text)

Why string output is still the same as text in all cases? I haven't understood the logic yet.

Does someone know what's happening?

UPDATE: THe whole programming text in case it allows to reproduce:

import requests
from bs4 import BeautifulSoup
import re

resp = requests.get('https://soysuper.com/p/granola-con-avena-y-frutos-rojos-kellogg-s-special-k-320-g-320-g', headers={'User-Agent':'Chrome/44.0.2403.157','Accept-Language': 'es-ES, es;q=0.5'})
soup = BeautifulSoup(resp.content.decode('UTF-8'),'html.parser')

data = [element.text for element in soup.find_all("section", {"class": "display display--coco"})]

text=str(data)

#option1
string=text.replace('\n',' ')
#option2
string=text.strip('\n')
#option3
string = re.sub('\n','', text)

print(string)

Solution

  • Just use .getText(strip=True).

    Here's how:

    import requests
    from bs4 import BeautifulSoup
    
    resp = requests.get('https://soysuper.com/p/granola-con-avena-y-frutos-rojos-kellogg-s-special-k-320-g-320-g', headers={'User-Agent':'Chrome/44.0.2403.157','Accept-Language': 'es-ES, es;q=0.5'})
    soup = BeautifulSoup(resp.content.decode('UTF-8'),'html.parser')
    
    data = [element.getText(strip=True) for element in soup.find_all("section", {"class": "display display--coco"})]
    print(data)
    

    Output:

    ['Este producto no está disponible en ningún supermercado online.']