Search code examples
pythonctypesgetattr

Why "__getattr__" does not work in python "ctypes"?


I stop it in the example of "datetime", is rewritten in a real example of lxml.
(It may be strange because English is translated in Google Translate is my statement I'm sorry.)

It is thought that I like lxml from very good performance, but the source is hard to read.
If you are actively using the XML, I do frequently can also be modified code of python.
Time has passed since forgotten, source because very difficult to understand,
I have taken the time to debug and fix.
For example, I think usually when you search as follows: deep XML hierarchy.

elem = lxml.etree.parse ("xxx/xxx/sample.xml").getroot()

elem.xpath("//depth3/text()")[0]

elem.find("./depth1/depth2/depth3").get("attr1").text

I wanted to use as follows.
(Use this code it's just me.)

elem.depth3.text (Ex.1)
OR
elem.depth1.depth2.depth3.text (Ex.2)

I tried the class inheritance is first to implement this.
You have customized a little bit by referring to the "Using custom Element classes in lxml".
I used the __getattr__ in order to search an XML element.

from lxml import etree
class CustomElement (etree.ElementBase):
    def __ getattr__ (self, k):
        ret = self.xpath ("/ /" + k)
        setattr(self, k, ret)
        return getattr(self, k)

Example of (Ex.1) to succeed.
But the example of (Ex.2) becomes Attribute Error __getattr__ is not present in the instance of the return of etree._Element depth1.

Although not (supplemental) practical, but I used an example of adding a "millisecond" of "datetime" in the first question from Easy to understand.

It was thought then it was a way to add functions to the Element class of lxml using the ctypes module.

import ctypes
import lxml.etree

class PyObject_HEAD(ctypes.Structure):
    _fields_ = [
        ('HEAD', ctypes.c_ubyte * (object.__basicsize__ -
                           ctypes.sizeof(ctypes.c_void_p))),
        ('ob_type', ctypes.c_void_p)
    ]
def __getattr__(self, k):
    ret = self.xpath("//" + k)
    setattr(self, k, ret)
    return getattr(self, k)

_get_dict          = ctypes.pythonapi._PyObject_GetDictPtr
_get_dict.restype  = ctypes.POINTER(ctypes.py_object)
_get_dict.argtypes = [ctypes.py_object]

EE = _get_dict(lxml.etree._Element).contents.value
EE["__getattr__"] = __getattr__

elem = lxml.etree.parse("xxx/xxx/sample.xml").getroot()
elem.xpath("//depth3")[0]

=> Return _Element object

from ispect import getsource
print getsource(elem.__getattr__)

=>def __getattr__(self, k):
=> ret = self.xpath("//" + k)
=> setattr(self, k, ret)
=> return getattr(self, k)
sources is added..

elem.depth3

=> AttributeError .. no attribute 'depth3'

I do not know if or should I write how using the "PyObject_GetAttr".
Please tell me if.

Best regards


====================Previous Question===================================
I'm trying to enhancements in ctypes. Add function usually go well. However, it does not work if you add a special method and Why?

import ctypes as c

class PyObject_HEAD(c.Structure):
    _fields_ = [
        ('HEAD', c.c_ubyte * (object.__basicsize__ -
                              c.sizeof(c.c_void_p))),
        ('ob_type', c.c_void_p)
    ]

pgd = c.pythonapi._PyObject_GetDictPtr
pgd.restype = c.POINTER(c.py_object)
pgd.argtypes = [c.py_object]

import datetime

def millisecond(td):
    return (td.microsecond / 1000)

d = pgd(datetime.datetime)[0]
d["millisecond"] = millisecond

now = datetime.datetime.now()
print now.millisecond(), now.microsecond

This prints 155 155958, Ok!

def __getattr__(self, k):
    return self, k

d["__getattr__"] = __getattr__

now = datetime.datetime
print now.hoge

This doesn't work, why?

Traceback (most recent call last):
  File "xxxtmp.py", line 31, in <module>
    print now.hoge
AttributeError: type object 'datetime.datetime' has no attribute 'hoge'

Solution

  • PyObject_GetAttr (Objects/object.c) uses the type's tp_getattro slot, or tp_getattr if the former isn't defined. It doesn't look up __getattribute__ in the MRO of the type.

    For a custom __getattr__ you'll need to subclass datetime. Your heap type will use slot_tp_getattr_hook (Objects/typeobject.c) as its tp_getattro. This function will look for __getattribute__ and __getattr__ in the type's MRO by calling _PyType_Lookup (Objects/typeobject.c).


    Given your update, see "using custom Element classes in lxml". For multiple results I've hacked a __getattr__ hook that uses a suffix notation for the index. It defaults to index 0 otherwise. Admittedly I haven't given it much thought, but clashes with existing names can be avoided if you always use the index.

    from lxml import etree
    
    def make_parser(element):
        lookup = etree.ElementDefaultClassLookup(element=element)
        parser = etree.XMLParser()
        parser.setElementClassLookup(lookup)
        return parser
    
    class CustomElement(etree.ElementBase):
        def __getattr__(self, attr):
            try:
                name, index = attr.rsplit('_', 1)
                index = int(index)
            except ValueError:
                name = attr
                index = 0
            return self.xpath(name)[index]
    
    parser = make_parser(CustomElement)
    

    For example:

    >>> spam = etree.fromstring(r'''
    ... <spam>
    ...     <foo>
    ...         <bar>eggs00</bar>
    ...         <bar>eggs01</bar>
    ...     </foo>
    ...     <foo>
    ...         <bar>eggs10</bar>
    ...         <bar>eggs11</bar>
    ...     </foo>
    ... </spam>
    ... ''', parser)
    
    >>> spam.foo_0.bar_0.text
    'eggs00'
    >>> spam.foo_0.bar_1.text
    'eggs01'
    >>> spam.foo_1.bar_0.text
    'eggs10'
    >>> spam.foo_1.bar_1.text
    'eggs11'