Search code examples
joomlascreen-scraping

How to manipulate a Joomla! website for easy screen scraping


I got permission from the owner (who knows nothing about web development) of a Joomla! website to extract the articles from the site (for real!)

I got the urls from the RSS feed, but the feed does not include the full text.

Do you know a way to manipulate the index.php parameters to get the article as clean as posible?

The url right now looks like:

http://www.example.com/index.php?option=com_content&task=view&id=2093&Itemid=1

Solution

  • Change your url to use "index2.php" instead of "index.php". That will strip away all navigation and use only the content of the article.

    http://www.example.com/index2.php?option=com_content&task=view&id=2093&Itemid=1