Search code examples
pythonpandaslxmlserieslxml.objectify

Why does Pandas Series created from list appear enclosed with square brackets?


I'm trying to create a Series in Pandas from a list of dates presented as strings, thus:

['2016-08-09',
 '2015-08-03',
 '2017-08-15',
 '2017-12-14',
...

but when I apply pd.Series from within the Pandas module the result in Jupyter notebook displays as:

0       [[[2016-08-09]]]
1       [[[2015-08-03]]]
2       [[[2017-08-15]]]
3       [[[2017-12-14]]]
...

Is there a simple way to fix it? The data has come from an Xml feed parsed using lxml.objectify.

I don't normally get these problems when reading from csv and just curious what I might be doing wrong.

UPDATE:

The code to grab the data and an example site:

import lxml.objectify import pandas as pd

def parse_sitemap(url):
        root = lxml.objectify.parse(url)
        rooted = root.getroot()
        output_1 = [child.getchildren()[0] for child in rooted.getchildren()]
        output_0 = [child.getchildren()[1] for child in rooted.getchildren()]
        return output_1

results = parse_sitemap("sitemap.xml")
pd.Series(results)

Solution

  • If you print out type(result[0]), you'll understand, it's not a string you get:

    print(type(results[0]))
    

    Output:

    lxml.objectify.StringElement
    

    This is not a string, and pandas doesn't seem to be playing nice with it. But the fix is easy. Just convert to string using pd.Series.astype:

    s = pd.Series(results).astype(str)
    print(s)
    
    0     2017-08-09T11:20:38Z
    1     2017-08-09T11:10:55Z
    2     2017-08-09T15:36:20Z
    3     2017-08-09T16:36:59Z
    4     2017-08-02T09:56:50Z
    5     2017-08-02T19:33:31Z
    6     2017-08-03T07:32:24Z
    7     2017-08-03T07:35:35Z
    8     2017-08-03T07:54:12Z
    9     2017-07-31T16:38:34Z
    10    2017-07-31T15:42:24Z
    11    2017-07-31T15:44:56Z
    12    2017-07-31T15:23:25Z
    13    2017-08-01T08:30:27Z
    14    2017-08-01T11:01:57Z
    15    2017-08-03T13:52:39Z
    16    2017-08-03T14:29:55Z
    17    2017-08-03T13:39:24Z
    18    2017-08-03T13:39:00Z
    19    2017-08-03T15:30:58Z
    20    2017-08-06T11:29:24Z
    21    2017-08-03T10:19:43Z
    22    2017-08-14T18:42:49Z
    23    2017-08-15T15:42:04Z
    24    2017-08-17T08:58:19Z
    25    2017-08-18T13:37:52Z
    26    2017-08-18T13:38:14Z
    27    2017-08-18T13:45:42Z
    28    2017-08-03T09:56:42Z
    29    2017-08-01T11:01:22Z
    dtype: object