The code goes something like this:
>>>data = pd.DataFrame({'P': ['p1', 'p1', 'p2'],
'Q': ['q1', 'q2', 'q1'],
'R': ['r1', 'r1', 'r2']})
>>>data
P Q R
0 p1 q1 r1
1 p1 q2 r1
2 p2 q1 r2
>>>data.groupby(['R'] + ['P','Q']).size().unstack(['P','Q'])
After reindexing and fillna(0) it gives the following result:
P p1 p2
Q q1 q2 q1 q2
R
r1 1 1 0 0
r2 0 0 1 0
I wanted to do the same with recarray so I imported itertools and tried the following:
>>>data = np.array([('p1', 'p1', 'p2'), ('q1', 'q2', 'q1'), ('r1', 'r1', 'r2')],
dtype=[('P',object),('Q',object),('R',object)]).view(np.recarray)
>>>groupby(data,key = (['R']+['P','Q'])).size().unstack(['P','Q'])
It doesn't work. How do I achieve a similar result without using pandas?
Let's back away from the fancy recarray and object type. It doesn't buy us anything.
The data can be a simple 2d array of strings:
In [711]: data = np.array([('p1', 'p1', 'p2'), ('q1', 'q2', 'q1'), ('r1', 'r1', 'r2')])
In [712]: data
Out[712]:
array([['p1', 'p1', 'p2'],
['q1', 'q2', 'q1'],
['r1', 'r1', 'r2']],
dtype='<U2')
Better yet, make it a list of lists:
In [713]: data.tolist()
Out[713]: [['p1', 'p1', 'p2'], ['q1', 'q2', 'q1'], ['r1', 'r1', 'r2']]
intertools.group
is designed to work with lists. It can operate on arrays simply because it can iterate on them.
Explain how you want to group these strings.
The pandas group by expression is not self explanatory.
If I simply flatten the data
array, I can group sequential values and count them:
In [726]: data.ravel()
Out[726]:
array(['p1', 'p1', 'p2', 'q1', 'q2', 'q1', 'r1', 'r1', 'r2'],
dtype='<U2')
In [727]: g=itertools.groupby(data.ravel())
In [728]: [(k,list(v)) for k,v in g]
Out[728]:
[('p1', ['p1', 'p1']),
('p2', ['p2']),
('q1', ['q1']),
('q2', ['q2']),
('q1', ['q1']),
('r1', ['r1', 'r1']),
('r2', ['r2'])]
In [729]: g=itertools.groupby(data.ravel())
In [730]: [(k,len(list(v))) for k,v in g]
Out[730]: [('p1', 2), ('p2', 1), ('q1', 1), ('q2', 1), ('q1', 1), ('r1', 2), ('r2', 1)]
=============
Extending my answer to work row-wise
In [738]: grps = [itertools.groupby(row) for row in data]
In [739]: [[(k, len(list(v))) for k,v in r] for r in grps]
[[('p1', 2), ('p2', 1)],
[('q1', 1), ('q2', 1), ('q1', 1)],
[('r1', 2), ('r2', 1)]]
This works for the object recarray version of data
as well.
Oops - I misunderstood your 'row-wise' description. Even rereading your last comment I don't understand what you want. It doesn't sound like a itertools.groupby
problem at all. I thought you were counting strings like 'r1' and 'q2'. Apparently that's not the case.
====================
OK, a more focused attempt to recreate the pandas table
Use itertools.product
to generate 8 combinations of these 6 strings:
In [847]: pos = list(product(['r1','r2'],['p1','p2'],['q1','q2']))
In [848]: pos
Out[848]:
[('r1', 'p1', 'q1'),
('r1', 'p1', 'q2'),
('r1', 'p2', 'q1'),
('r1', 'p2', 'q2'),
('r2', 'p1', 'q1'),
('r2', 'p1', 'q2'),
('r2', 'p2', 'q1'),
('r2', 'p2', 'q2')]
convert the dataframe to a list of lists:
In [849]: val=data.values[:,[2,0,1]].tolist()
In [850]: val
Out[850]: [['r1', 'p1', 'q1'], ['r1', 'p1', 'q2'], ['r2', 'p2', 'q1']]
find which of the possible combinations are found in vals
:
In [852]: [[i, list(i) in val] for i in pos]
Out[852]:
[[('r1', 'p1', 'q1'), True],
[('r1', 'p1', 'q2'), True],
[('r1', 'p2', 'q1'), False],
[('r1', 'p2', 'q2'), False],
[('r2', 'p1', 'q1'), False],
[('r2', 'p1', 'q2'), False],
[('r2', 'p2', 'q1'), True],
[('r2', 'p2', 'q2'), False]]
Rework the 'counts' as a 2x8 0/1 array:
In [853]: np.array([[list(i) in val] for i in pos]).reshape(2,-1).astype(int)
Out[853]:
array([[1, 1, 0, 0],
[0, 0, 1, 0]])