Search code examples
pythonclangdecoratorpython-decoratorspython-descriptors

Trying to get a grip on CachedPropery in clang\cindex.py


This is related to other question I had, which left with no answer... I trying to understand what's going on under the hood of the Python binding to libclang, and having really hard-time doing so.

I've read TONs of articles about both decorators and descriptors in Python, in order to understand how the CachedProperty class in clang/cindex.py works, but still can't get all the pieces together.

The most related texts I've seen is one SO answer, and this code recipe in ActiveState. This helps me a bit, but - as I mentioned - I'm still not there.

So, let's cut to the chase: I want to understand why am I getting AssertionError on creating CIndex. I will post here only the relevant code (cindex.py is 3646 lines long..), and I hope I don't miss anything that is relevant to me. My code has only one relevant line, which is:

index = clang.cindex.Index.create()

This reffers to line 2291 in cindex.py, which yields:

return Index(conf.lib.clang_createIndex(excludeDecls, 0))

From now on, there's a series of function calls, which I can't explain why and WTH did they come from. I'll list the code and pdb output along the questions that relevant to each part:

(Important thing to notice ahead: conf.lib defined like this:)

class Config:
    ...snip..

    @CachedProperty
    def lib(self):
        lib = self.get_cindex_library()
        ...
        return lib

CachedProperty code:

class CachedProperty(object):
    """Decorator that lazy-loads the value of a property.

    The first time the property is accessed, the original property function is
    executed. The value it returns is set as the new value of that instance's
    property, replacing the original method.
    """

    def __init__(self, wrapped):
        self.wrapped = wrapped
        try:
            self.__doc__ = wrapped.__doc__
        except:
            pass

    def __get__(self, instance, instance_type=None):
        if instance is None:
            return self

        value = self.wrapped(instance)
        setattr(instance, self.wrapped.__name__, value)

        return value

Pdb output:

-> return Index(conf.lib.clang_createIndex(excludeDecls, 0))
(Pdb) s
--Call--
> d:\project\clang\cindex.py(137)__get__()
-> def __get__(self, instance, instance_type=None):
(Pdb) p self
<clang.cindex.CachedProperty object at 0x00000000027982E8>
(Pdb) p self.wrapped
<function Config.lib at 0x0000000002793598>
  1. Why the next call after Index(conf.lib.clang_createIndex(excludeDecls, 0)) is to CachedProperty.__get__ method? What about the __init__?
  2. If the __init__ method isn't get called, how comes that self.wrapped has value?

Pdb output:

(Pdb) r
--Return--
> d:\project\clang\cindex.py(144)__get__()-><CDLL 'libcla... at 0x27a1cc0>
-> return value
(Pdb) n
--Call--
> c:\program files\python35\lib\ctypes\__init__.py(357)__getattr__()
-> def __getattr__(self, name):
(Pdb) r
--Return--
> c:\program files\python35\lib\ctypes\__init__.py(362)__getattr__()-><_FuncPtr obj...000000296B458>
-> return func
(Pdb)
  1. Where CachedProperty.__get__ should return value to? Where the call for CDLL.__getattr__ method come from?

MOST CRITICAL PART, for me

(Pdb) n
--Call--
> d:\project\clang\cindex.py(1970)__init__()
-> def __init__(self, obj):
(Pdb) p obj
40998256

This is the creation of ClangObject, which class Index inherits from.

  1. But - where there's any call to __init__ with one parameter? Is this is the one that conf.lib.clang_createIndex(excludeDecls, 0) returning?
  2. Where is this number (40998256) coming from? I'm getting the same number over and over again. As far as I understand, it should be just a number, but a clang.cindex.LP_c_void_p object and that's why the assertion failed.

To sum it up, the best for me will be step-by-step guidance of the functions invocation over here, cause I'm felling a little lost in all this...


Solution

  • The CachedProperty object is a descriptor object; the __get__ method is called automatically whenever Python tries to access an attribute on an instance that is only available on the class and has a __get__ method.

    Using CachedProperty as a decorator means it is called and an instance of CachedProperty is created that replaces the original function object on the Config class. It is the @CachedProperty line that causes CachedProperty.__init__ to be called, and the instance ends up on the Config class as Config.lib. Remember, the syntax

    @CachedProperty
    def lib(self):
        # ...
    

    is essentially executed as

    def lib(self):
        # ...
    lib = CachedProperty(lib)
    

    so this creates an instance of CachedProperty() with lib passed in as the wrapped argument, and then Config.lib is set to that object.

    You can see this in the debugger; one step up you could inspect type(config).lib:

    (Pdb) type(config)
    <class Config at 0x00000000027936E>
    (Pdb) type(config).lib
    <clang.cindex.CachedProperty object at 0x00000000027982E8>
    

    In the rest of the codebase config is an instance of the Config class. At first, that instance has no lib name in the __dict__ object, so the instance has no such attribute:

    (Pdb) 'lib' in config.__dict__
    False
    

    So trying to get config.lib has to fall back to the class, where Python finds the Config.lib attribute, and this is a descriptor object. Instead of using Config.lib directly, Python returns the result of calling Config.lib.__get__(config, Config) in that case.

    The __get__ method then executes the original function (referenced by wrapped) and stores that in config.__dict__. So future access to config.lib will find that result, and the descriptor on the class is not going to be used after that.

    The __getattr__ method is called to satisfy the next attribute in the conf.lib.clang_createIndex(excludeDecls, 0) expression; config.lib returns a dynamically loaded library from cdll.LoadLibrary() (via CachedProperty.__get__()), and that specific object type is handled by the Python ctypes libary. It translates attributes to specific C calls for you; here that's the clang_createIndex method; see Accessing functions from loaded dlls.

    Once the call to conf.lib.clang_createIndex(excludeDecls, 0) completes, that resulting object is indeed passed to Index(); the Index() class itself has no __init__ method, but the base class ClangObject does.

    Whatever that return value is, it has a representation that looks like an integer number. However, it almost certainly is not an int. You can see what type of object that is by using type(), see what attributes it has with dir(), etc. I'm pretty certain it is a ctypes.c_void_p data type representing a clang.cindex.LP_c_void_p value (it is a Python object that proxies for the real C value in memory); it'll represent as an integer:

    Represents the C void * type. The value is represented as integer. The constructor accepts an optional integer initializer.

    The rest of the clang Python code will just pass this value back to more C calls proxied by config.lib.