How to get real web trace data for academic research?

I want to collect some web access data for academic research on network, it should follows the well-known zipf-distribution, but I don't know where can I get it.

The newer the better for this data. I found some link in some old paper, since many papers are too old, the link on the papers is being closed.


  • My suggestions for getting web trace data would be:

    • Public Datasets: Look for publicly available datasets from academic institutions or similar:
    1. https://archive.ics.uci.edu/ml/index.php
    2. https://www.kaggle.com/datasets
    • Organizations: Contact organizations that might be willing to send you datasets under certain conditions.

    • Government Resources: A lot of government agencies release datasets for research.

    • Internet Archive: The Internet Archive maybe has historical web data which could be useful.

    Remember to follow the datas relevance to the zipf distribution and the ethical and legal guidelines of using web trace data.