Search code examples
pythonpython-typingh5py

How to extend type-hinting for a class in a stub file


I have this code, and it annoys me that I have to cast f twice:

    with h5py.File(computed_properties_path, "r") as f:
        # get the set of computed metrics
        computed_metrics = set()
        # iterating through the file iterates through the keys which are dataset names
        f = cast(Iterable[str], f)
        dataset_name: str
        for dataset_name in f:
            # re-cast it as a file
            f = cast(h5py.File, f)
            dataset_group = index_hdf5(f, [dataset_name], h5py.Group)
            for metric_name in dataset_group:
                logger.info(f"Dataset: {dataset_name}, Metric: {metric_name}")

I just want to be able to tell the static type checker that if I iterate through a file, I'll get back strings (which are keys to the groups and datasets in the file).

I've tried creating this .pyi stub to create a class that does this, but I get an error saying that File is not defined. My guess is that this is because Pylance now relies solely on my stub, rather than looking up extra definitions in the original file.

I've tried a lot of different options through Claude and ChatGPT, but can't quite seem to figure out how to extend the type-hinting so that Pylance knows that iterating through an h5py.File object will yield strings.


Solution

  • Just don't change the type of f; iterate directly over the return value of cast.

     with h5py.File(computed_properties_path, "r") as f:
            computed_metrics = set()
            for dataset_name in cast(Iterable[str], f):
                dataset_group = index_hdf5(f, [dataset_name], h5py.Group)
                for metric_name in dataset_group:
                    logger.info(f"Dataset: {dataset_name}, Metric: {metric_name}")
    

    If you need the iterable name elsewhere (you never use computed_metrics, so I assume this is just a reduced version of your real code), use a different name instead of rebinding f.

     with h5py.File(computed_properties_path, "r") as f:
            computed_metrics = set()
            f_itr = cast(Iterable[str], f)
            dataset_name: str
            for dataset_name in f_itr:
                dataset_group = index_hdf5(f, [dataset_name], h5py.Group)
                for metric_name in dataset_group:
                    logger.info(f"Dataset: {dataset_name}, Metric: {metric_name}")