Search code examples
pythonlistsortingcurve

Rearranging list items based on a score to fit a function curve


Given that I have:

  • a list of words
  • points/scores that indicates "simplicity" for each word
  • the difficulty levels of each word:

E.g.

>>> words = ['apple', 'pear', 'car', 'man', 'average', 'older', 'values', 'coefficient', 'exponential']
>>> points = ['9999', '9231', '8231', '5123', '4712', '3242', '500', '10', '5']
>>> bins = [0, 0, 0, 0, 1, 1, 1, 2, 2]

Currently, the word list is ordered by the simplicity points.

What if I want to model the simplicity as a "quadratic curve"?, i.e. from highest to a low point and then back to high, i.e. produce a word list that looks like this with the corresponding points:

['apple', 'pear', 'average', 'coefficient', 'exponential', 'older', 'values', 'apple', 'pear']

I have tried this but it's painfully crazy:

>>> from collections import Counter
>>> Counter(bins)[0]
4
>>> num_easy, num_mid, num_hard = Counter(bins)[0], Counter(bins)[1], Counter(bins)[2]
>>> num_easy
4
>>> easy_words = words[:num_easy]
>>> mid_words = words[num_easy:num_easy+num_mid]
>>> hard_words = words[-num_hard:]
>>> easy_words, mid_words, hard_words
(['apple', 'pear', 'car', 'man'], ['average', 'older', 'values'], ['coefficient', 'exponential'])
>>> easy_1 = easy_words[:int(num_easy/2)]
>>> easy_2 = easy_words[len(easy_1):]
>>> mid_1 = mid_words[:int(num_mid/2)]
>>> mid_2 = mid_words[len(mid_1):]
>>> new_words = easy_1 + mid_1 + hard_words + mid_2 + easy_1 
>>> new_words
['apple', 'pear', 'average', 'coefficient', 'exponential', 'older', 'values', 'apple', 'pear']

Imagine the no. of bins is >3 or maybe I want to "points" of the words to fit an sine-shape curve.

Note that this has not exactly an nlp question nor it has anything to do with 'zipf' distribution and creating something to match or reorder the ranking of the word.

Imagine there's a list of integers you have an object (in this case a word) map to each integer and you want to reorder the list of object to fit a quadratic curve.


Solution

  • I'd do sth along these lines. Sort the words by their points, take every second out, reverse that half and concat the two:

    >>> s = sorted(zip(map(int, points), words))
    >>> new_words = [word for p, word in list(reversed(s[::2])) + s[1::2]]
    # If you have lots of words you'll be better off using some 
    # itertools like islice and chain, but the principle becomes evident
    >>> new_words
    ['apple', 'car', 'older', 'values', 'exponential', 'coefficient', 'average', 'man', 'pear']
    

    Ordered as in:

    [(9999, 'apple'), (8231, 'car'), (4712, 'older'), (500, 'values'), (5, 'exponential'), (10, 'coefficient'), (3242, 'average'), (5123, 'man'), (9231, 'pear')]