I stop it in the example of "datetime", is rewritten in a real example of lxml.
(It may be strange because English is translated in Google Translate is my statement I'm sorry.)
It is thought that I like lxml from very good performance, but the source is hard to read.
If you are actively using the XML, I do frequently can also be modified code of python.
Time has passed since forgotten, source because very difficult to understand,
I have taken the time to debug and fix.
For example, I think usually when you search as follows: deep XML hierarchy.
elem = lxml.etree.parse ("xxx/xxx/sample.xml").getroot()
elem.xpath("//depth3/text()")[0]
elem.find("./depth1/depth2/depth3").get("attr1").text
I wanted to use as follows.
(Use this code it's just me.)
elem.depth3.text (Ex.1)
OR
elem.depth1.depth2.depth3.text (Ex.2)
I tried the class inheritance is first to implement this.
You have customized a little bit by referring to the "Using custom Element classes in lxml".
I used the __getattr__
in order to search an XML element.
from lxml import etree
class CustomElement (etree.ElementBase):
def __ getattr__ (self, k):
ret = self.xpath ("/ /" + k)
setattr(self, k, ret)
return getattr(self, k)
Example of (Ex.1) to succeed.
But the example of (Ex.2) becomes Attribute Error __getattr__
is not present in the instance of the return of etree._Element depth1.
Although not (supplemental) practical, but I used an example of adding a "millisecond" of "datetime" in the first question from Easy to understand.
It was thought then it was a way to add functions to the Element class of lxml using the ctypes module.
import ctypes
import lxml.etree
class PyObject_HEAD(ctypes.Structure):
_fields_ = [
('HEAD', ctypes.c_ubyte * (object.__basicsize__ -
ctypes.sizeof(ctypes.c_void_p))),
('ob_type', ctypes.c_void_p)
]
def __getattr__(self, k):
ret = self.xpath("//" + k)
setattr(self, k, ret)
return getattr(self, k)
_get_dict = ctypes.pythonapi._PyObject_GetDictPtr
_get_dict.restype = ctypes.POINTER(ctypes.py_object)
_get_dict.argtypes = [ctypes.py_object]
EE = _get_dict(lxml.etree._Element).contents.value
EE["__getattr__"] = __getattr__
elem = lxml.etree.parse("xxx/xxx/sample.xml").getroot()
elem.xpath("//depth3")[0]
=> Return _Element object
from ispect import getsource
print getsource(elem.__getattr__)
=>def __getattr__
(self, k):
=> ret = self.xpath("//" + k)
=> setattr(self, k, ret)
=> return getattr(self, k)
sources is added..
elem.depth3
=> AttributeError .. no attribute 'depth3'
I do not know if or should I write how using the "PyObject_GetAttr".
Please tell me if.
Best regards
====================Previous Question===================================
I'm trying to enhancements in ctypes.
Add function usually go well.
However, it does not work if you add a special method and Why?
import ctypes as c
class PyObject_HEAD(c.Structure):
_fields_ = [
('HEAD', c.c_ubyte * (object.__basicsize__ -
c.sizeof(c.c_void_p))),
('ob_type', c.c_void_p)
]
pgd = c.pythonapi._PyObject_GetDictPtr
pgd.restype = c.POINTER(c.py_object)
pgd.argtypes = [c.py_object]
import datetime
def millisecond(td):
return (td.microsecond / 1000)
d = pgd(datetime.datetime)[0]
d["millisecond"] = millisecond
now = datetime.datetime.now()
print now.millisecond(), now.microsecond
This prints 155 155958
, Ok!
def __getattr__(self, k):
return self, k
d["__getattr__"] = __getattr__
now = datetime.datetime
print now.hoge
This doesn't work, why?
Traceback (most recent call last):
File "xxxtmp.py", line 31, in <module>
print now.hoge
AttributeError: type object 'datetime.datetime' has no attribute 'hoge'
PyObject_GetAttr
(Objects/object.c) uses the type's tp_getattro
slot, or tp_getattr
if the former isn't defined. It doesn't look up __getattribute__
in the MRO of the type.
For a custom __getattr__
you'll need to subclass datetime
. Your heap type will use slot_tp_getattr_hook
(Objects/typeobject.c) as its tp_getattro
. This function will look for __getattribute__
and __getattr__
in the type's MRO by calling _PyType_Lookup
(Objects/typeobject.c).
Given your update, see "using custom Element classes in lxml". For multiple results I've hacked a __getattr__
hook that uses a suffix notation for the index. It defaults to index 0 otherwise. Admittedly I haven't given it much thought, but clashes with existing names can be avoided if you always use the index.
from lxml import etree
def make_parser(element):
lookup = etree.ElementDefaultClassLookup(element=element)
parser = etree.XMLParser()
parser.setElementClassLookup(lookup)
return parser
class CustomElement(etree.ElementBase):
def __getattr__(self, attr):
try:
name, index = attr.rsplit('_', 1)
index = int(index)
except ValueError:
name = attr
index = 0
return self.xpath(name)[index]
parser = make_parser(CustomElement)
For example:
>>> spam = etree.fromstring(r'''
... <spam>
... <foo>
... <bar>eggs00</bar>
... <bar>eggs01</bar>
... </foo>
... <foo>
... <bar>eggs10</bar>
... <bar>eggs11</bar>
... </foo>
... </spam>
... ''', parser)
>>> spam.foo_0.bar_0.text
'eggs00'
>>> spam.foo_0.bar_1.text
'eggs01'
>>> spam.foo_1.bar_0.text
'eggs10'
>>> spam.foo_1.bar_1.text
'eggs11'