Search code examples
pythontypingtype-hinting

How should I represent the type of a string containing a path to a file in type annotations?


I'm writing a library that provides a class that's meant to be subclassed by end users. The class's constructor cares about the types of the arguments of the methods added to the subclass. I want the end users to be able to indicate the arguments' types with type annotations.

One of the things my class cares about is whether an argument is a string that contains a path to a file. Logically, this should be a subtype of str, which can be indicated like so:

import mylib
from typing import *

FilePath = NewType('FilePath', AnyStr)

class MySubclass(mylib.MyClass):
    def my_method(self, path: FilePath):
        return open(path)

But the docstring of typing.NewType gives the following examples:

from typing import *

UserId = NewType('UserId', int)

def name_by_id(user_id: UserId) -> str:
    ...

name_by_id(42)          # Fails type check
name_by_id(UserId(42))  # OK

So, in order for a static type checker to not fail code that uses my library, users would have to do the following:

from mylib import *

... # MySubclass defined as above

o = MySubclass()
o.my_method(FilePath('foo/bar.baz'))

But I want them to be able to simply do

o.my_method('foo/bar.baz')

without a static type checker throwing errors. This is more because I am worried about the semantics of the types I'm defining, rather than because of the danger of anybody ever actually using my code and bothering to run a static type checker on it.

One solution is to define FilePath as

FilePath = Union[AnyStr, NewType('FilePath', AnyStr)]

but this is confusing to look at, and its __repr__ is a straight-up lie:

>>> FilePath
Union[Anystr, FilePath]

Is there a better way?


Solution

  • Your two goals are incompatible: you can't simultaneously specify that your method accepts only a specific path-like objects (or even a specific subtype of str) while allowing the caller to directly pass in some arbitrary str.

    You need to pick one of the two.

    If you decide to go with the former (specify the method only accepts specific path-like objects), a more palatable alternative to using NewTypes might be to instead switch to having your method accept only pathlib.Path objects:

    from pathlib import Path
    
    class MyClass:
        def my_method(self, x: Path) -> None: ...
    
    MyClass().my_method(Path("foo/bar.baz"))
    

    Your callers will still need to convert their strings into these Path objects, but at least now they'll gain some actual runtime benefit from doing so.

    If you decide to go with the latter goal (allow users to pass in strings directly), you might as well get rid of all of your NewTypes and switch to using str (or Union[Text, bytes] or AnyStr) directly. It would be a more honest type signature:

    class MyClass:
        def my_method(self, x: str) -> None: ...
    
    MyClass().my_method("foo/bar.baz")
    

    You could perhaps make this slightly more readable by using type aliases, like so:

    MaybeAPath = str
    
    class MyClass:
        def my_method(self, x: MaybeAPath) -> None: ...
    
    MyClass().my_method("foo/bar.baz")
    

    ...but this is a readability improvement only. To be fully type-safe, your code and your subclasser's code would still need to include some error handling if they receive some random string that can't be parsed as a path.

    I personally bias towards using pathlib.Path objects everywhere, for what it's worth.