How I can count the total elements in a dataframe, including the subset, and put the result in the new column?
import pandas as pd
x = pd.Series([[1, (2,5,6)], [2, (3,4)], [3, 4], [(5,6), (7,8,9)]], \
index=range(1, len(x)+1))
df = pd.DataFrame({'A': x})
I tried with the following code but it gives 2 in each of row:
df['Length'] = df['A'].apply(len)
print(df)
A Length
1 [1, (2, 5, 6)] 2
2 [2, (3, 4)] 2
3 [3, 4] 2
4 [(5, 6), (7, 8, 9)] 2
However, what I want to get is as follow:
A Length
1 [1, (2, 5, 6)] 4
2 [2, (3, 4)] 3
3 [3, 4] 2
4 [(5, 6), (7, 8, 9)] 5
thanks
Given:
import pandas as pd
x = pd.Series([[1, (2,5,6)], [2, (3,4)], [3, 4], [(5,6), (7,8,9)]])
df = pd.DataFrame({'A': x})
You can write a recursive generator that will yield 1
for each nested element that is not iterable. Something along these lines:
import collections
def glen(LoS):
def iselement(e):
return not(isinstance(e, collections.Iterable) and not isinstance(e, str))
for el in LoS:
if iselement(el):
yield 1
else:
for sub in glen(el): yield sub
df['Length'] = df['A'].apply(lambda e: sum(glen(e)))
Yielding:
>>> df
A Length
0 [1, (2, 5, 6)] 4
1 [2, (3, 4)] 3
2 [3, 4] 2
3 [(5, 6), (7, 8, 9)] 5
That will work in Python 2 or 3. With Python 3.3 or later, you can use yield from
to replace the loop:
def glen(LoS):
def iselement(e):
return not(isinstance(e, collections.Iterable) and not isinstance(e, str))
for el in LoS:
if iselement(el):
yield 1
else:
yield from glen(el)