Search code examples
htmlsecurityverification

What in an HTML page is unique?


My question is about verification more than anything else. What can be used to determine what is unique in an HTML document? (The document can have a degree of being dynamic.)

What is able to be used, or generated to recognize that a page is the correct page to an accuracy of say 99%, taking into consideration you can store a "fingerprint" of sorts of the page you are verifying?


For clarity, this is an added extra to encryption/https etc. This page can and will change with dynamic content according to specific users, however so can the fingerprint, but a single fingerprint cannot 100% match 100% of users due to the nature of dynamic content. Therefore a hash cannot work here, at least not in a simplistic form.


Solution

  • A unique fingerprint of a HTML page is easy to calculate. Build a hash from the following:

    • protocol: http or https
    • URL: domain + uri
    • Query_string
    • the exact page's contents down to a byte

    Optionally some headers:

    • Server
    • Content-Type this is important
    • Content-encoding this probably too
    • more ideas? Feel free to edit them in.

    this assumes you're not POSTing any data to pages.