Search code examples
pythonpandassortingcountmultiple-columns

Replace or change position in columns and sort the index Python Pandas


I'm having troubles with the sorting of my counts generating my df so basically I'm running a for to use value_counts in each column but the problem comes when the is a few colums in the new df that are not sorted in a single way so there is columns like this:

enter image description here

I would like to have just one order for this columns either way yes, no don't know or any order but the same for the 3 columns and also they share the 3 options just because it doesn't appear any value with Don't know doest appear in the first 2 columns yes,no so I would like to add Don't know at the end, copy or replace the other 2 columns with the same options as column 3. I tried to explai it the best I could, this is my code:

li = []

for i in range(0, len(df.columns)):
    value_counts = df.iloc[:, i].value_counts().to_frame().reset_index()
    li.append(value_counts)

Solution

  • EDIT:

    Because different values in columns is problematic join to one DataFrame, you can create multiple ouputs depends of data:

    np.random.seed(202)
        
    cat1 = ['more','less','no idea','same','other']
    cat2 = ['cat1','cat2','cat3']
    cat3 = ['Yes','No', "I don't know"]
    
    df1 = pd.DataFrame(np.random.choice(cat1, size=(10, 2)), columns=list('BD'))
    df2 = pd.DataFrame(np.random.choice(cat2, size=(10, 3)), columns=list('CFH'))
    df3 = pd.DataFrame(np.random.choice(cat3, size=(10, 4)), columns=list('AIEG'))
    df = pd.concat([df1, df2, df3], axis=1).sort_index(axis=1)
    

    print (df)
                  A      B     C        D             E     F             G     H  \
    0           Yes   same  cat2    other            No  cat1  I don't know  cat2   
    1  I don't know  other  cat1     more            No  cat1            No  cat3   
    2  I don't know   less  cat3     less            No  cat3            No  cat1   
    3            No   less  cat1     more            No  cat2            No  cat3   
    4            No  other  cat1     same           Yes  cat1           Yes  cat2   
    5            No   less  cat3    other            No  cat2           Yes  cat3   
    6  I don't know  other  cat2    other  I don't know  cat1  I don't know  cat1   
    7  I don't know   more  cat2  no idea            No  cat3  I don't know  cat2   
    8            No   less  cat3     same  I don't know  cat1           Yes  cat3   
    9           Yes  other  cat2     same  I don't know  cat3           Yes  cat3   
    
                  I  
    0  I don't know  
    1           Yes  
    2  I don't know  
    3  I don't know  
    4  I don't know  
    5           Yes  
    6           Yes  
    7            No  
    8           Yes  
    9  I don't know  
    

    #columns with Yes, No, I don't know values to seperate DataFrame
    mask = df.isin(['Yes','No', "I don't know"]).all()
    cols1 = df.columns[mask]
    
        
    df11 = pd.concat({i: df[i].value_counts() for i in cols1}, axis=1)
    print (df11)
                  A  E  G  I
    No            4  6  3  1
    I don't know  4  3  3  5
    Yes           2  1  4  4
    
    #all another columns to another DataFrame
    cols2 = df.columns[~mask]
    df22 = pd.concat({i: df[i].value_counts() for i in cols2}, axis=1)
    print (df22)
               B    C    D    F    H
    other    4.0  NaN  3.0  NaN  NaN
    less     4.0  NaN  1.0  NaN  NaN
    more     1.0  NaN  2.0  NaN  NaN
    same     1.0  NaN  3.0  NaN  NaN
    cat2     NaN  4.0  NaN  2.0  3.0
    cat3     NaN  3.0  NaN  3.0  5.0
    cat1     NaN  3.0  NaN  5.0  2.0
    no idea  NaN  NaN  1.0  NaN  NaN