I have a list of urls->small numbers. numbers represents the importance of each URL
url, value
https://mywebsite.com/p/1, 0.00212
https://mywebsite.com/p/2, 0.00208
https://mywebsite.com/p/3, 0.00201
https://mywebsite.com/p/4, 0.00138
https://mywebsite.com/p/5, 0.00067
https://mywebsite.com/p/1, 0.00001
...
let's say the sum of values = 1 I would like to represent those number within a scale of 0-10 and maintaining a kind of ratio difference between numbers
url, value, scaled_value
https://mywebsite.com/p/1, 0.00212, 10
https://mywebsite.com/p/2, 0.00208, 9
https://mywebsite.com/p/3, 0.00201, 9
https://mywebsite.com/p/4, 0.00138, 6
https://mywebsite.com/p/5, 0.00067, 3
https://mywebsite.com/p/1, 0.00001, 1
...
something like this (I don't know if the ratio difference here is maintained tho) any one can help with the maths ? thanks
#Update
thanks to @annZen help, I tried two approach, but the results are different I don't know why. if someone can help ?
here are the two formulas I used :
res1 = round(x*9/maxpri)+1
res2 = round(((x-minpri)/(maxpri-minpri))*10, 2)
Here is a way:
with open('file.txt', 'r') as p:
lst = p.read().splitlines() # List all the lines of the file
lst2 = [float(i.split(', ')[1]) for i in lst[1:]] # List all the floats
num = [round(a*9/max(lst2))+1 for a in lst2] # List all the scaled numbers
for i,(l,n) in enumerate(zip(lst,['scaled_value']+num)):
lst[i] = f"{l}, {n}" # Add the 'scaled_value' column
with open('file.txt', 'w') as p:
p.write('\n'.join(lst)) # Write the updated data into the file
Before:
url, value
https://mywebsite.com/p/1, 0.00212
https://mywebsite.com/p/2, 0.00208
https://mywebsite.com/p/3, 0.00201
https://mywebsite.com/p/4, 0.00138
https://mywebsite.com/p/5, 0.00067
https://mywebsite.com/p/1, 0.00001
After:
url, value, scaled_value
https://mywebsite.com/p/1, 0.00212, 10
https://mywebsite.com/p/2, 0.00208, 10
https://mywebsite.com/p/3, 0.00201, 10
https://mywebsite.com/p/4, 0.00138, 7
https://mywebsite.com/p/5, 0.00067, 4
https://mywebsite.com/p/1, 0.00001, 1
UPDATE:
The part of my code that does the converting is:
num = [round(a*9/max(lst2))+1 for a in lst2]
where lst2
is simply the list of floats extracted from the file. You updated the question for me to explain the difference between
res1 = round(x*9/maxpri)+1
res2 = round(((x-minpri)/(maxpri-minpri))*10, 2)
Lets first see then in my list comprehension:
num1 = [round(x*9/max(lst2))+1 for x in lst2]
num2 = [round(((x-min(lst2))/(max(lst2)-min(lst2)))*10, 2) for x in lst2]
print(num1)
print(num2)
Output:
[10, 10, 10, 7, 4, 1]
[10.0, 9.81, 9.48, 6.49, 3.13, 0.0]
The first clearest difference is that i rounded my answer to the nearest integer. Without it, it would be:
num1 = [round(x*9/max(lst2), 2)+1 for x in lst2]
num2 = [round(((x-min(lst2))/(max(lst2)-min(lst2)))*10, 2) for x in lst2]
print(num1)
print(num2)
Output:
[10.0, 9.83, 9.53, 6.86, 3.84, 1.04]
[10.0, 9.81, 9.48, 6.49, 3.13, 0.0]
The values are now very close, but there's one more thing. My code assumes that the minimum value for the scaled values is 1
, because I saw in your post https://mywebsite.com/p/1, 0.00001, 1
. I now realize that you stated 0-10, not 1-10. So that another is to change the 9
(10-1=9) to a 10
(10-0=10), and removing the +1
:
round(x*9/max(lst2), 2)+1
round(x*10/max(lst2), 2)
num1 = [round(x*10/max(lst2), 2) for x in lst2]
num2 = [round(((x-min(lst2))/(max(lst2)-min(lst2)))*10, 2) for x in lst2]
print(num1)
print(num2)
Output:
[10.0, 9.81, 9.48, 6.51, 3.16, 0.05]
[10.0, 9.81, 9.48, 6.49, 3.13, 0.0]
Still a little different, that's because I assumed the minimum value in your column is 0
, because you didn't show your whole array. But in this case, it's 0.00001
. So, go with:
num = [round(((x-min(lst2))/(max(lst2)-min(lst2)))*10, 2) for x in lst2]