Search code examples
pythonpandascharactermultiple-columnsrename

Replace the columns in the same order Pandas


Hi I was removing some characters from the column names in my df but there is a lot of columns with diferent number of characters so I need to remove them specifically for diferent groups so I was using this code for the first 10 columns but it didn't work for the columns 19-29 because are replaced in the firsts columns:

new_names = dfr.iloc[:,0:10].rename(columns=lambda x: x[75:]).columns #first 10 columns
dfr.rename(columns=dict(zip(dfr.columns,new_names)),inplace=True)

The code that replace the first columns with the second group:

new_names = dfr.iloc[:,19:29].rename(columns=lambda x: x[38:]).columns #from  19-29 columns
dfr.rename(columns=dict(zip(dfr.columns,new_names)),inplace=True)

So I need to do this like for another 5 more groups some help to finish this.


Solution

  • Use:

    np.random.seed(2020)
      
        
    L = ['abdef','trasdfg','ssfgh','dfghj','jhgfdsa','kjhtf','ghrtd']
    c = np.random.choice(L, size=10)
    df = pd.DataFrame(np.random.randint(100, 105, size=(3, 10)), columns=c)
    print (df)
       abdef  abdef  dfghj  ghrtd  dfghj  dfghj  kjhtf  dfghj  abdef  kjhtf
    0    100    100    100    102    101    103    103    102    103    100
    1    104    104    100    104    101    101    102    101    102    104
    2    104    102    103    104    101    104    101    103    102    100
    

    You can specify how many values are set to new columns names, here 3,4,3 and then in another list specify how many letters are filtered:

    vals = [3,4,3]
    filt=[2,3,1]
    

    Then create tuples with cumulative sum and zip for start and end for filter:

    cols = np.cumsum([0] + vals)
    
    print (list( zip(zip(cols, cols[1:]), filt)))
    [((0, 3), 2), ((3, 7), 3), ((7, 10), 1)]
    

    In list comprehension create final columns names and set them back:

    new = [x for (a,b),c in zip(zip(cols, cols[1:]), filt) for x in df.columns[a:b].str[:c]]
    
    
    df.columns = new
    print (df)
        ab   ab   df  ghr  dfg  dfg  kjh    d    a    k
    0  100  100  100  102  101  103  103  102  103  100
    1  104  104  100  104  101  101  102  101  102  104
    2  104  102  103  104  101  104  101  103  102  100