Search code examples
pythonweb-scrapingadblock

How to match html tag.id or tag.class names using adblockparser rules?


from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc,'html.parser')
for tag in soup.find_all(True):
    print rules.should_block([TAG ID OR TAG CLASS])    

As I know Adblock can block HTML element based on its names.

For example:

If a div's ID is #ads it would be blocked.

How can I do something similar?


Solution

  • To block a class name, you will need the filter

    ||domain.com##.classnamehere
    

    To block an element by ID, you will need the filter

    ||domain.com###IDnamehere
    

    If you're trying to understand filters better, there's a good primer here: https://adblockplus.org/filters

    If you're trying to understand which filters are affecting a particular site, there's a good filter search-engine here: http://blockadblock.com/search-adblock-filters.php

    Adding the filters to a Python parser is probably beyond the scope of this answer but there's plenty of good documentation here: https://github.com/scrapinghub/adblockparser