Search code examples
htmlstatic-files

how to identify static assets from an HTML


I have been tasked with listing all static assets from a webpage. I understand that static assets are those file that never change at runtime.

Is there a systematic way to distinguish these from dynamic files?

If I have to list all the static assets. What would be the best starting point?


Solution

  • From the client's perspective, there is no systematic way to determine which part of an html response comes from a static file or is generated at runtime by the server. The html standard does not make that difference.

    That being said, most of the time you can guess which part of the response comes from static files. When the html responds with links to files such as .css, .js, .gif and so on, there's a pretty good chance that that comes from a static file.

    But determining which html element comes from a file on disk and which is built at runtime is not possible.

    The systematic approach is to analyze the code that forms the html response, and determine from there which resource is static and which is dynamic.

    edit You added that you don't need to be 100%. If you don't need that, then maybe you can use cloudflare's list of file extentions. The reason cloudflare caches these extensions by default is that they are typically static.