Search code examples
pythonhtmlpyquery

How to parse HTML table using pyquery?


How to parse HTML table using pyquery? [See Source code html table on http://pastie.org/pastes/8556919

Result: {

"category_1":{ "cat1_el1_label":"cat1_el1_value",},

"category_2":{"cat2_el1_label":"cat2_el1_value",},

"category_3":{"cat3_el1_label":"cat3_el1_value",}

}

Thank you very much.


Solution

  • Simple way:

    from pyquery import PyQuery
    from collections import defaultdict
    
    doc = PyQuery(html)
    values = defaultdict(dict)
    for tr in doc('tr').items():
        if tr('th.title'):
            title = tr('th.title').text()
        else:
            items = zip(tr('.properties_label').items(),
                        tr('.properties_value').items())
            values[title].update(dict([(k.text(), v.text()) for k, v in items]))
    

    Result:

    defaultdict(<type 'dict'>, {'Category_3': {'cat3_el1_label': 'cat3_el1_value'},
                                'Category_2': {'cat2_el1_label': 'cat2_el1_value'},
                                'Category_1': {'cat1_el1_label': 'cat1_el1_value'}})