Search code examples
pythondataframesetlabel

Adding elements from dataframe to a set object


I am trying to add my labels into a set object but when i try to do this i get a weird output. I want to have all the labels in the object with no repeating ones

types = set()
for t in frame4['practice']:
    types.update(t)
types
{'1',
 '3',
 'A',
 'B',
 'C',
 'D',
 'E',
 'F',
 'G',
 'I',
 'L',
 'M',
 'N',
 'O',
 'P',
 'S',
 'T',
 'W',
 'Z',
 '_',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'y'}

This is how the dataframe practice looks like. There is some repetitions since they are lables and all nan elements were removed

2        Identifier_Cookie_or_similar_Tech_1stParty
3                    Identifier_IP_Address_1stParty
4        Identifier_Cookie_or_similar_Tech_1stParty
8        Identifier_Cookie_or_similar_Tech_3rdParty
10                             Demographic_3rdParty
                            ...                    
21612                          Demographic_1stParty
21613                          Demographic_3rdParty
21614    Identifier_Cookie_or_similar_Tech_1stParty
21615    Identifier_Cookie_or_similar_Tech_3rdParty
21616    Identifier_Cookie_or_similar_Tech_1stParty
Name: practice, Length: 10201, dtype: object

Solution

  • update() needs list of values

    types.update( [t] )
    

    When you send single string then it treats string as list of chars.


    You could do it even without for-loop

    types.update( frame4['practice'] )
    

    or even directly

    types = set( frame4['practice'] )
    

    But you can do it even without set() but using .unique()

    types = frame4['practice'].unique()
    

    And if you want to remove duplicate values then use .drop_duplicates()

    df = df['practice'].drop_duplicates(keep='last')
    

    Minimal working example:

    import pandas as pd
    
    df = pd.DataFrame({
        'practice': ['abc', 'xyz', 'qrt', 'abc', '123', 'qrt']
    })
    
    print('--- 1 ---')
    types = set( df['practice'] )
    print(types)
    
    print('--- 2 ---')
    types = set()
    types.update( df['practice'] )
    print(types)
    
    print('--- 3 ---')
    types = df['practice'].unique()
    print(types)
    
    print('--- 4 ---')
    df = df['practice'].drop_duplicates(keep='last')
    print(df)
    

    Result:

    --- 1 ---
    {'qrt', 'abc', 'xyz', '123'}
    --- 2 ---
    {'qrt', 'abc', 'xyz', '123'}
    --- 3 ---
    ['abc' 'xyz' 'qrt' '123']
    --- 4 ---
    1    xyz
    3    abc
    4    123
    5    qrt
    Name: practice, dtype: object