from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc,'html.parser')
for tag in soup.find_all(True):
print rules.should_block([TAG ID OR TAG CLASS])
As I know Adblock can block HTML element based on its names.
For example:
If a div
's ID is #ads
it would be blocked.
How can I do something similar?
To block a class name, you will need the filter
||domain.com##.classnamehere
To block an element by ID, you will need the filter
||domain.com###IDnamehere
If you're trying to understand filters better, there's a good primer here: https://adblockplus.org/filters
If you're trying to understand which filters are affecting a particular site, there's a good filter search-engine here: http://blockadblock.com/search-adblock-filters.php
Adding the filters to a Python parser is probably beyond the scope of this answer but there's plenty of good documentation here: https://github.com/scrapinghub/adblockparser