Search code examples
apt

Why do the two Python APIs for accessing the APT cache return different sets of packages?


The python-apt package provides two APIs for accessing the APT cache:

  • apt_pkg.Cache

    A Cache object represents the cache used by APT which contains information about packages. The object itself provides no means to modify the cache or the installed packages, see the classes DepCache and PackageManager for such functionality.

  • apt.Cache

    The APT cache file contains a hash table mapping names of binary packages to their metadata. A Cache object is the in-core representation of the same. It provides access to APTs idea of the list of available packages.

It is unclear to me why they would contain different sets of packages, but indeed they do:

import apt, apt_pkg

cache = apt_pkg.Cache(apt.progress.base.OpProgress())
cache_pkgs = set(pkg.get_fullname() for pkg in cache.packages)

aptcache = apt.Cache(apt.progress.base.OpProgress())
aptcache_pkgs = set(pkg.fullname for pkg in aptcache)

print(len(cache_pkgs), len(aptcache_pkgs))
# on my system, this outputs: 92488 64447

Though it appears that the latter is a subset of the former:

print(aptcache_pkgs - cache_pkgs)
# on my system, this outputs: set()

Some scripts like this one from Ubuntu will use both, like this:

# we need another cache that has more pkg details
with apt.Cache() as aptcache:
    for pkg in cache.packages:
        aptcache[pkg.get_fullname()]

What is the distinction between these two methods of accessing the APT cache and why do they return different sets of packages?


Solution

  • Answer from Julian Andres Klode, a maintainer of the project:

    apt.Cache only includes real packages, the one in apt_pkg also has virtual packages. You can see that in apt/cache.py how it filters the apt_pkg.Cache.