Apologies if this is a trivial task. I am very new to coding and Python and am learning as part of my dissertation.
I have a data frame and want to rank each column within it based on its closeness to a specified value, rather than an ascending or descending order.
I am working on a way to compare Running/Cycling routes. As part of this process I am trying to find how a query route compares to a target route based on a few different attributes: Distance, Elevation Gain, Elevation Loss and Gradient. My resultant data frame, shows the error between the two routes in each attribute within the comparison (i.e. [the target route value - the query route value] / the target route value). The problem that I am currently facing is ranking these results. As a perfect match would be a value of 0, I want to rank the values based on their closeness to this.
The data frame to be ranked:
scores = pd.DataFrame({'distance':[0.15, 0.07, -0.09, 0], 'elevation_gain':
[-0.19,-8.39, -0.86, 0],'elevation_loss':[-3.73, -2.51, -0.16, 0],
'gradient': [0.12, 0.39, 2.77, 0]})
In this case, the 4th route is the query route, as such the result is a perfect match and should therefore be ranked 1st.
As there are negative values, I don't think, a descending ranking would be suitable.
what I am aiming for is:
ranks = pd.DataFrame({'distance':[4, 2, 3, 1], 'elevation_gain': [2,4, 3,
1],'elevation_loss':[4, 3, 2, 1], 'gradient': [2, 3, 4, 1]})
(Appologies I don't know how to visualise these data frames to make this easier to digest)
I could then create a new column, summing the ranks and the lowest score would indicate the best match.
Thanks for any help in advance!
Try this:
ranks = scores.abs().apply(pd.Series.rank).astype(int)
ranks
Output:
distance elevation_gain elevation_loss gradient
0 4 2 4 2
1 2 4 3 3
2 3 3 2 4
3 1 1 1 1