Search code examples
pythonpandasdictionarynamedtuple

How to deal with parsing an arbitrary number of lists into a dictionary


I am parsing an XMI/XML data structure into a pandas dataframe by first decomposing it into a dictionary. When I encounter a named tuple in a list in my XMI, there appear to be a maximum of two named tuples in my list (although the majority only have one).

To handle this case, I am doing the following:

if val is not None and val:
    if len(val) == 1:
        d['modifiedBegin'] = val[0].begin
        d['modifiedEnd'] = val[0].end
        d['modifiedBegin1'] = None
        d['modifiedEnd1'] = None
    else:
        d['modifiedBegin1'] = val[1].begin
        d['modifiedEnd1'] = val[1].end

My issues with this are: a) I cannot be guaranteed that there are only two lists in my list that I am decomposing, and b) this feels cheap, ugly and just plain wrong!

I really would like to come up with a more general solution, especially given item a) above.

My data look like:

val = [Span(xmiID=105682, begin=13352, end=13358, type='org.metamap.uima.ts.Span'), Span(xmiID=105685, begin=13368, end=13374, type='org.metamap.uima.ts.Span')]

I would really much rather parse this out into two separate rows in my dataframe, instead of having more columns. The major issue is that both of these tuples share common data from a larger object that looks like:

Negation(xmiID=142613, id=None, negType='nega', negTrigger='without', modifier=[Span(xmiID=105682, begin=13352, end=13358, type='org.metamap.uima.ts.Span'), Span(xmiID=105685, begin=13368, end=13374, type='org.metamap.uima.ts.Span')]) 

So, both rows share the attributes negType and negTrigger... what is a more general way of decomposing this to insert into my dataframe. I though of iterating through the elements when the length of the list ws greater than one and then inserting into the datframe on each iteration, but that seems messy.

My desired outcome would thus be to have a dataframe that looks like (minus the indices and other common junk):

enter image description here


Solution

    • Iterate over Negation namedtuples
      • for each thing in negation.modifier
        • add a row using the negation attributes and the things attributes

    Or instead of parsing XML to namedtuples to dictionaries skip the middle part and create a single dictionary - {'begin':[row0,row1,...],'end':[row0,row1,...],'negtrigger':[row0,row1,...],'negtype':[row0,row1,...]} - from the XML