Search code examples
pythonpandasdataframemulti-index

How can I drop repeating values in a column while keeping the data for its rows?


I have a dataframe that has column of repeating values/indexes and I want to group it by the 'Name' column but without performing any aggregation to it. I've looked at the Dataframe.groupby() function but from what I've searched, you are kind of forced to perform an aggregation. I've also tried Dataframe.set_index('Name') and then do Dataframe.sort_index(inplace = True) but it for some reason it returns a Nonetype I feel like it's something obvious but I can't quite figure it out.

This is my dataframe now:

Name Data1 Data2
A    .1    1.1
A    .2    1.2
A    .3    1.3
B    .6    1.6
B    .7    1.7
B    .8    1.8
C    1.0   2.0
C    1.1   2.1
C    1.2   2.2

This is what I want:

Name Data1 Data2
A    .1    1.1
     .2    1.2
     .3    1.3
B    .6    1.6
     .7    1.7
     .8    1.8
C    1.0   2.0
     1.1   2.1
     1.2   2.2

Solution

  • If want replace duplicated values to empty strings use Series.duplicated with Series.mask:

    df['Name'] = df['Name'].mask(df['Name'].duplicated(), '')
    print (df)
      Name  Data1  Data2
    0    A    0.1    1.1
    1         0.2    1.2
    2         0.3    1.3
    3    B    0.6    1.6
    4         0.7    1.7
    5         0.8    1.8
    6    C    1.0    2.0
    7         1.1    2.1
    8         1.2    2.2
    

    In MultiIndex for first look it also working, but NOT, values are only in default view NOT displayed in first level(s) if duplicated:

    df1 = df.set_index(['Name','Data1'])
    print (df1)
                Data2
    Name Data1       
    A    0.1      1.1
         0.2      1.2
         0.3      1.3
    B    0.6      1.6
         0.7      1.7
         0.8      1.8
    C    1.0      2.0
         1.1      2.1
         1.2      2.2
    

    Check docs, In [21]:, there is no omited data, only moved to MultiIndex:

    with pd.option_context('display.multi_sparse', False):
        print (df1)
    
                Data2
    Name Data1       
    A    0.1      1.1
    A    0.2      1.2
    A    0.3      1.3
    B    0.6      1.6
    B    0.7      1.7
    B    0.8      1.8
    C    1.0      2.0
    C    1.1      2.1
    C    1.2      2.2
    

    Or if convert values of MultiIndex to list:

    print (df1.index.tolist())
    [('A', 0.1), ('A', 0.2), ('A', 0.3), 
     ('B', 0.6), ('B', 0.7), ('B', 0.8), 
     ('C', 1.0), ('C', 1.1), ('C', 1.2)]