I scrape websites with selenium and put then the content in pandas to easily use it. My only problem is that when I use the .text function on a selenium webelement, all the special html characters are kept but cannot be deleted because they are invisible. Is there a way to delete them all when scraping ?
Thank you all !
I have encountered a similar problem awhile ago. Without any reproducible code or HTML it's a bit hard to say, but the best way I found was to remove special characters was executing a JS script:
driver.execute_script("var element = document.getElementsByClassName('<class_name>');for (var i = element.length - 1; i >= 0; --i) {element[i].remove();}")
Replace <class_name> with the name of the class you would like to remove. Now you can grab the webelement you need without worrying about special characters.