Search code examples
pythongoogle-bigquerypackagepypi

Determining When a Package First Appeared in the PyPI Index using BigQuery?


I'm working on a project where I need to find out the initial release date of a package on the PyPI index. I came across the bigquery-public-data.pypi.distribution_metadata public dataset on BigQuery. From my understanding, there's a field called upload_time in this dataset that might indicate the upload timestamp of a package.

Can anyone confirm if the upload_time field refers to the time when a package was first uploaded to the PyPI index?

For reference, I found details about the PyPI BigQuery dataset here: PyPI bigquery dataset. https://warehouse.pypa.io/api-reference/bigquery-datasets.html

But PyPI website does not mention anything about the table metadata description.


Solution

  • In the table bigquery-public-data.pypi.distribution_metadata, the upload_time field represents the date and time the specific version of the package was uploaded.

    For example,

    image

    From the above picture, we can see that the ‘google-cloud-bigquery’ version 0.20.0 upload date is 2016-09-29 which means the first version 0.20.0 of the ‘google-cloud-bigquery’ package was released on 2016-09-29. You can confirm this by checking the release history from the official Pypi documentation.