Search code examples
pythonpandassortingdataframe

Pandas sort_values does not sort numbers correctly


I have sorted a dataframe by a specific column but the answer that pandas spits out is not exactly correct.

Here is the code I have used:

league_dataframe.sort_values('overall_league_position')

The result that the sort method yields values in column 'overall league position' are not sorted in ascending order which is the default for the method.

enter image description here

What am I doing wrong?


Solution

  • For whatever reason, you seem to be working with a column of strings, and sort_values is returning you a lexsorted result.

    Here's an example.

    df = pd.DataFrame({"Col": ['1', '2', '3', '10', '20', '19']})
    df
    
      Col
    0   1
    1   2
    2   3
    3  10
    4  20
    5  19
    
    df.sort_values('Col')
    
      Col
    0   1
    3  10
    5  19
    1   2
    4  20
    2   3
    

    The remedy is to convert it to numeric, either using .astype or pd.to_numeric.

    df.Col = df.Col.astype(float)
    

    Or,

    df.Col = pd.to_numeric(df.Col, errors='coerce')
    
    df.sort_values('Col')
    
       Col
    0    1
    1    2
    2    3
    3   10
    5   19
    4   20
    

    The only difference b/w astype and pd.to_numeric is that the latter is more robust at handling non-numeric strings (they're coerced to NaN), and will attempt to preserve integers if a coercion to float is not necessary (as is seen in this case).