Search code examples
pythonwandb

wandb: get a list of all artifact collections and all aliases of those artifacts


The wandb documentation doesn't seem to explain how to do this - but it should be a fairly common use case I'd imagine?

I achieved mostly (but not completely) what I wanted like this, but it seems a bit clunky? I'd have expected to have an self.aliases property on the ArtifactCollection instances?

UPDATE: it does seem to be part of the sdk from >= v0.13.10: https://github.com/wandb/wandb/blob/v0.13.10/wandb/apis/public.py#L4050-L4053

ENTITY = os.environ.get("WANDB_ENTITY")
API_KEY = os.environ.get("WANDB_API_KEY")

def get_model_artifacts(key=None):
    wandb.login(key=key if key is not None else API_KEY)
    api = wandb.Api(overrides={"entity": ENTITY})
    model_names = [
        i
        for i in api.artifact_type(
            type_name="models", project="train"
        ).collections()
    ]
    for model in model_names:
        artifact = api.artifact("train/" + model.name + ":latest")
        model._attrs.update(artifact._attrs)
        model._attrs["metadata"] = json.loads(model._attrs["metadata"])
        model.aliases = [x["alias"] for x in model._attrs["aliases"]]
    return model_names

I guess I could possibly look into writing a custom graph-ql query if needed or just use this clunky method.

Am I missing something? Is there a cleaner way to do this?

The one thing this clunky method is missing is any old aliases - it only shows the latest model and then any aliases of that (let's say "latest" and also "v4" etc.) - not sure how this would/should be displayed but I'd have hoped to be able to get old aliases as well (i.e. aliases that point to old versions of the artifact). Although, this is less important.

EDIT - after a few hours looking through their sdk code, I have this (still not that happy with how clunky it is):

ENTITY = os.environ.get("WANDB_ENTITY")
API_KEY = os.environ.get("WANDB_API_KEY")

def get_model_artifacts(key=None):
    wandb.login(key=key if key is not None else API_KEY)
    api = wandb.Api(overrides={"entity": ENTITY})
    model_artifacts = [
        a
        for a in api.artifact_type(
            type_name="models", project="train"
        ).collections()
    ]

    def get_alias_tuple(artifact_version):
        version = None
        aliases = []
        for a in artifact_version._attrs["aliases"]:
            if re.match(r"^v\d+$", a["alias"]):
                version = a["alias"]
            else:
                aliases.append(a["alias"])
        return version, aliases

    for model in model_artifacts:
        # artifact = api.artifact("train/" + model.name + ":latest")
        # model._attrs.update(artifact._attrs)
        # model._attrs["metadata"] = json.loads(model._attrs["metadata"])
        versions = model.versions()
        version_dict = dict(get_alias_tuple(version) for version in versions)
        model.version_dict = version_dict
        model.aliases = [
            x for key, val in model.version_dict.items() for x in [key] + val
        ]
    return model_artifacts

Solution

  • I'm Annirudh. I'm an engineer at W&B who helped build artifacts. Your solution is really close, but by using the latest alias when fetching the artifact we're only going to be considering the aliases from that one artifact instead of all the versions. You could get around that by looping over the versions:

    api = wandb.Api()
    collections = [
        coll for coll in api.artifact_type(type_name=TYPE, project=PROJECT).collections()
    ]
    
    
    aliases = set()
    for coll in collections:
        for artifact in coll.versions():
            aliases.update(artifact.aliases)
    
    print(collections)
    print(aliases)
    

    Currently, the documentation is spare on collections but we're polishing them up in the public API and will release some docs around it shortly. These APIs aren't quite release ready yet -- so apologies for the rough edges.

    Please feel free to reach out to me directly in the future if you have any other questions regarding artifacts. Always happy to help.