Search code examples
pip

How is Python `pip install` cache file's ID string calculated?


When I use pip install pillow==10.2.0 to install a Pillow on my machine, the original filename from server is pillow-10.2.0-cp312-cp312-win_amd64.whl .

enter image description here

After the download, it is cached in my local disk:

C:\Users\chj\AppData\Local\pip\cache\http-v2\5\d\a\8\5\5da855ed79847734593562113083d79c2634b1696f0d635c02984eb4.body

enter image description here

I'd like to know, how is that ID string(as cache filename) 5da855ed79847734593562113083d79c2634b1696f0d635c02984eb4 calculated?

It is not the SHA224 or SHA256 of the .body file.

sha224(5da855ed79847734593562113083d79c2634b1696f0d635c02984eb4.body)=98dede132d9782d07fd42cf70d1734984f8bfd5c60c6018903053d68

sha256(5da855ed79847734593562113083d79c2634b1696f0d635c02984eb4.body)=154e939c5f0053a383de4fd3d3da48d9427a7e985f58af8e94d0b3c9fcfcf4f9

Then what is it?


Solution

  • The ID string is the sha224 hash of the url in hex:

    $ echo -n 'https://files.pythonhosted.org/packages/51/07/7e9266a59bb267b56c1f432f6416653b9a78dda771c57740d064a8aa2a44/pillow-10.2.0-cp312-cp312-win_amd64.whl' | openssl sha224 -hex
    SHA2-224(stdin)= 5da855ed79847734593562113083d79c2634b1696f0d635c02984eb4
    

    You didn't specify the version of pip you are looking at so I checked out the latest version from https://github.com/pypa/pip and searched for '.body' and found two references. The most promising appears to be file_cache.py:

        def get_body(self, key: str) -> IO[bytes] | None:
            name = self._fn(key) + ".body"
            try:
                return open(name, "rb")
            except FileNotFoundError:
                return None
    

    and then you just follow _fn() to encode() and finally the call to get_body(cache_url):

        def encode(x: str) -> str:
            return hashlib.sha224(x.encode()).hexdigest()
    
        def _fn(self, name: str) -> str:
            # NOTE: This method should not change as some may depend on it.
            #       See: https://github.com/ionrock/cachecontrol/issues/63
            hashed = self.encode(name)
            parts = list(hashed[:5]) + [hashed]
            return os.path.join(self.directory, *parts)
    
                body_file = self.cache.get_body(cache_url)