Search code examples
phpscraperrate

Html Tag counting - Rate of Change formula


I've been trying to a find a statistics-esque formula for calculating the rate of change for html tags which are either added or removed from various websites.

So, for example, with the scraper I'm writing, I obtain the initial tag count and then cache that value. Later, on the next round, I compare the current tag count obtained with the past tag count, and calculate a percentage based on the differences between the two in terms of rate of change.

Other factors are included here, such as the number of times the website has been scraped, as well the dates these scrapes occur, etc.

What would be the ideal formula for something of this nature?


Solution

  • counting tags is ok, additionally you may look for table trees or div trees and their depth.

    for ex,

    <div>
      <div>
        <div> .. </div>
      </div>
    </div>
    depth is 3