Search code examples
pythonpandasdataframefrozenset

dataframe to frozenset


I want to translate a dataframe to frozensets and keep the dataframe columns within the frozenset.

Example

x=pd.DataFrame(data=dict(sample=["A","B","C"],lane=[1,1,2]))
>>> x
   lane sample
0     1      A
1     1      B
2     2      C

And I would like something as :

x2= {frozenset({("sample", "A"), ("lane", 1)}),
    frozenset({("sample", "B"), ("lane", 1)}),
    frozenset({("sample", "C"), ("lane", 2)})}

>>> x2
{frozenset({('sample', 'B'), ('lane', 1)}), frozenset({('sample', 'A'), ('lane', 1)}), frozenset({('lane', 2), ('sample', 'C')})}

I tried x.apply(frozenset,1) but it gives me that :

0    (1, A)
1    (1, B)
2    (C, 2)
dtype: object

Any help will be useful. Thank you


Solution

  • You can convert your dataframe to the records format you need with pd.DataFrame.to_dict:

    x.to_dict('records')
    
    # [{'sample': 'A', 'lane': 1}, 
    #  {'sample': 'B', 'lane': 1}, 
    #  {'sample': 'C', 'lane': 2}]
    

    Since this results in a list, you can then map frozenset to the list like so:

    # using abbreviation 'r' instead of 'records'
    map(lambda y: frozenset(y.iteritems()), x.to_dict('r'))
    
    # [frozenset([('sample', 'A'), ('lane', 1)]), 
    #  frozenset([('sample', 'B'), ('lane', 1)]), 
    #  frozenset([('sample', 'C'), ('lane', 2)])]
    

    Or, using a set comprehension, if your output should be a set of frozensets:

    {frozenset(y.iteritems()) for y in x.to_dict('records')}
    
    # set([frozenset([('sample', 'C'), ('lane', 2)]),  
    #      frozenset([('sample', 'B'), ('lane', 1)]), 
    #      frozenset([('sample', 'A'), ('lane', 1)])])