Search code examples
pythonpandasapplymulti-indexdrop

pandas drop a column according to one header value


I have this dataframe

name,01100MS,02200MS,02500MS,03100MS,06400MS
lat,626323,616720,616288,611860,622375
long,5188431,5181393,5173583,5165895,5152605
alt,915,1499,1310,1235,190
1920-01-01,1,4.1,4.41,4.441,4.4441
1920-01-02,2,4.2,4.42,4.442,4.4442
1920-01-03,3,4.3,4.43,4.443,4.4443
1920-01-04,4,4.4,4.44,4.444,4.4444
1920-01-05,5,4.5,4.45,4.445,4.4445
1920-01-06,6,4.6,4.46,4.446,4.4446
1920-01-07,7,4.7,4.47,4.447,4.4447
1920-01-08,8,4.8,4.48,4.448,4.4448
1920-01-09,9,4.9,4.49,4.449,4.4449
1920-01-10,10,5,4.5,4.45,4.445
1920-01-11,11,5.1,4.51,4.451,4.4451

I read it as:

 dfr     =  pd.read_csv(f_name,
                        parse_dates           = True,
                        index_col             = 0,
                        header                = [0,1,2,3],
                        infer_datetime_format = True,
                        cache_dates=True)

I would like to drop some columns according to the a threshold value in the 4th row, that is one of the heads due to the the fact that I use multiple index.

I would like to do something like:

for column in dfr:
    if dfr[column][2] <= 1300.:
        dfr = dfr.drop(column,axis=1) 

The problem is that I am not able to select the right "head" inside the multi heads. I would like also to do it in a smart way, avoiding in other word the cycle.


Solution

  • You can select forth level by Index.get_level_values and select columns with invert mask - greater like 1300 in DataFrame.loc:

    df = df.loc[:,df.columns.get_level_values(3).astype(int) > 1300]
    

    Or if dont need always converting to integers is possible set values before solution:

    df = df.rename(columns=int, level=3)
    print (df.columns)
    MultiIndex([('01100MS', '626323', '5188431',  915),
                ('02200MS', '616720', '5181393', 1499),
                ('02500MS', '616288', '5173583', 1310),
                ('03100MS', '611860', '5165895', 1235),
                ('06400MS', '622375', '5152605',  190)],
               names=['name', 'lat', 'long', 'alt'])
    
    df = df.loc[:,df.columns.get_level_values(3) > 1300]
    print (df)
    name       02200MS 02500MS
    lat         616720  616288
    long       5181393 5173583
    alt           1499    1310
    1920-01-01     4.1    4.41
    1920-01-02     4.2    4.42
    1920-01-03     4.3    4.43
    1920-01-04     4.4    4.44
    1920-01-05     4.5    4.45
    1920-01-06     4.6    4.46
    1920-01-07     4.7    4.47
    1920-01-08     4.8    4.48
    1920-01-09     4.9    4.49
    1920-01-10     5.0    4.50
    1920-01-11     5.1    4.51