I am parsing an XMI/XML data structure into a pandas dataframe by first decomposing it into a dictionary. When I encounter a named tuple in a list in my XMI, there appear to be a maximum of two named tuples in my list (although the majority only have one).
To handle this case, I am doing the following:
if val is not None and val:
if len(val) == 1:
d['modifiedBegin'] = val[0].begin
d['modifiedEnd'] = val[0].end
d['modifiedBegin1'] = None
d['modifiedEnd1'] = None
else:
d['modifiedBegin1'] = val[1].begin
d['modifiedEnd1'] = val[1].end
My issues with this are: a) I cannot be guaranteed that there are only two lists in my list that I am decomposing, and b) this feels cheap, ugly and just plain wrong!
I really would like to come up with a more general solution, especially given item a) above.
My data look like:
val = [Span(xmiID=105682, begin=13352, end=13358, type='org.metamap.uima.ts.Span'), Span(xmiID=105685, begin=13368, end=13374, type='org.metamap.uima.ts.Span')]
I would really much rather parse this out into two separate rows in my dataframe, instead of having more columns. The major issue is that both of these tuples share common data from a larger object that looks like:
Negation(xmiID=142613, id=None, negType='nega', negTrigger='without', modifier=[Span(xmiID=105682, begin=13352, end=13358, type='org.metamap.uima.ts.Span'), Span(xmiID=105685, begin=13368, end=13374, type='org.metamap.uima.ts.Span')])
So, both rows share the attributes negType
and negTrigger
... what is a more general way of decomposing this to insert into my dataframe. I though of iterating through the elements when the length of the list ws greater than one and then inserting into the datframe on each iteration, but that seems messy.
My desired outcome would thus be to have a dataframe that looks like (minus the indices and other common junk):
Negation
namedtuples
negation.modifier
Or instead of parsing XML to namedtuples to dictionaries skip the middle part and create a single dictionary - {'begin':[row0,row1,...],'end':[row0,row1,...],'negtrigger':[row0,row1,...],'negtype':[row0,row1,...]}
- from the XML